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Abstract 

Winter's measurement compression theorem stands as one of the most penetrating insights 
of quantum information theory. In addition to making an original and profound statement 
about measurement in quantum theory, it also underlies several other general protocols used for 
entanglement distillation and local purity distillation. The theorem provides for an asymptotic 
decomposition of any quantum measurement into noise and information. This decomposition 
leads to an optimal protocol for having a sender simulate many independent instances of a 
quantum measurement and send the measurement outcomes to a receiver, using as little com- 
munication as possible. The protocol assumes that the parties have access to some amount of 
common randomness, which is a strictly weaker resource than classical communication. 

In this paper, we provide a full review of Winter's measurement compression theorem, detail- 
ing the information processing task, giving examples for understanding it, reviewing Winter's 
achicvability proof, and detailing a new approach to its single-letter converse theorem. We 
prove an extension of the theorem to the case in which the sender is not required to receive the 
outcomes of the simulated measurement. The total cost of common randomness and classical 
communication can be lower for such a "non-feedback" simulation, and we prove a single-letter 
converse theorem demonstrating optimality. We then review the Devctak- Winter theorem on 
classical data compression with quantum side information, providing new proofs of its achiev- 
ability and converse parts. From there, we outline a new protocol that we call "measurement 
compression with quantum side information," announced previously by two of us in our work 
on triple trade-offs in quantum Shannon theory. This protocol has several applications, includ- 
ing its part in the "classically-assisted state redistribution" protocol, which is the most general 
protocol on the static side of the quantum information theory tree, and its role in reducing 
the classical communication cost in a task known as local purity distillation. We also outline a 
connection between measurement compression with quantum side information and recent work 
on entropic uncertainty relations in the presence of quantum memory. Finally, we prove a single- 
letter theorem characterizing measurement compression with quantum side information when 
the sender is not required to obtain the measurement outcome. 



1 Introduction 



Measurement plays an important role in quantum theory. It is the interface between the macroscopic 
world of everyday experience and the quantum world, which is characterized by noncommutativity 
and superposition. The translation is imperfect, however, with superposition and noncommutativity 
leading necessarily to uncertainty in the outcomes of measurements. In any given measurement, 
there will be noise inherent to the measurement procedure, uncertainty due to the state being 
measured and, most importantly, information. The objective of this article is to explain how to 
separate out these components, precisely identifying and quantifying them in the data produced 
by a quantum measurement. To do so, it will be crucial to adopt an information-theoretic point 
of view, not just to provide the necessary techniques to solve the problem, but even to figure out 
how to properly formulate the question. 

If we are only concerned with capturing the statistics of the outcomes of a quantum measure- 
ment, the most general mathematical description is to use the positive operator- valued measure 
(POVM) formalism |21| , I34| , I3D] . In the POVM formalism, a quantum measurement is specified as a 
set A = {A x } of operators indexed by some classical label x corresponding to the classical outcomes 
of the measurement. These operators should be positive and form a resolution of the identity on 
the Hilbert space of the system that is being measured: 



Given a quantum state described by a density operator p (a positive, unit trace operator) and a 
POVM A, a measurement of p specified by A induces a random variable X, and the probability 
Px(x) for the classical outcome x to occur is given by the Born rule: 



Positivity of the operators A x and p guarantees positivity of the distribution px(x), and that 
the set A forms a resolution of the identity and the density operator p has unit trace guarantees 
normalization of the distribution px(x). 

The above definition of a POVM makes it clear that the set of all POVMs is a convex set, i.e., 
given a POVM A = {A x } and another T = {T x }, with < A < 1, the convex combination AA + 
(1 — A)r = {\A X + (1 — A)r x } is also a POVM. The physical interpretation of this convexity is that 
it might be possible to decompose any particular measuring apparatus into noise and information. 
If an apparatus does not admit a decomposition of this form, then it is an extremal POVM, lying 
on the boundary of the convex set. If it does, however, as in the above example apparatus, one 
could first flip a biased coin with distribution (A, 1 — A) to determine whether to perform A or 
r and then perform the corresponding measurement. The coin flip is a source of noise because 
it is independent of the physical measurement outcome, and the distribution for the outcome 
corresponds to the information. Decomposing an apparatus in this way is a useful idea with many 
applications. 

To develop a robust quantitative theory, however, it is surprisingly effective to consider the above 
ideas from an information-theoretic standpoint, in the sense of Shannon |57j . In this context, that 
approach will have three main features: a tolerance for small imperfections, a focus on asymptotics, 
and an emphasis on communication. To begin with, let us focus on the first two. From an 
operational point of view, there is little justification for requiring an exact convex decomposition 
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of a given measurement. As long as any imperfections are very small, approximation by a convex 
decomposition leads to experimentally indistinguishable consequences. Moreover, measurement 
statistics are most meaningful in a setting in which the measurement is repeated many times 
on identical state preparations. As such, it is sensible, and remarkably powerful, to ask about 
approximate convex decomposition of repeated measurements, with the permissible imperfection 
required to vanish in the limit of infinite repetitions. 

The relevance of communication is less immediate. In the example described above, the mea- 
surement AA + (1 — A)r could be implemented by first flipping a coin and then either measuring A 
or r. This opens up the possibility of significantly compressing the measurement outcomes because 
there will generically be less uncertainty about the outcome of either A or T alone than the convex 
combination AA + (1 — A)T. To formalize this notion, one could imagine that two parties, tradition- 
ally named Alice and Bob, are trying to collectively implement a measurement. They share some 
common random bits that can be used to perform the (A, 1 — A) coin flip without communicating, 
and Alice holds the quantum system on which AA+ (1 — A)T is to be measured. Based on the result 
of the coin flip, Alice would apply either A or T and compress the outcome as much as possible, 
minimizing the number of bits she needs to send to Bob in order to allow him to reconstruct the 
outcome of the measurement. Optimizing the number of bits required over all possible measure- 
ment simulation strategies, of which we have only described one, then provides a robust operational 
measure of the amount of information generated by the quantum measurement. 

In a seminal paper, Winter successfully performed this information-theoretic analysis of mea- 
surement, and in so doing, was able to make a profound and original statement about the nature 
of information in quantum measurement [65 . The content of his "measurement compression the- 
orem" is the specification of an optimal two-dimensional rate region, characterizing the resources 
needed for an asymptotically faithful simulation of a quantum measurement A on a state p in 
terms of common randomness and classical communication. The sender (Alice) and receiver (Bob) 
both obtain the outcome of the measurement, and as such, this is known as a "feedback simula- 
tion" (terminology introduced in a different though related context [5j ) . His measurement compres- 
sion protocol achieves one important optimal rate pair in this region: if, to first order, at least 
nH(X\R) bits of common randomness are available, then it is possible to simulate the measure- 
ment A® n on the state p® n with only about nI(X;R) bits of classical communication. We allow 
n, the number of repetitions of the measurement A, to go to infinity, in which limit the simulation 
becomes asymptotically faithful. The entropies H(X\R) and I(X;R) are defined as 

H(X\R) = H(XR) - H(R), 
I(X;R) = H(X) - H(X\R), 

with the von Neumann entropy of a state a defined as H(a) = — Tr{crlog 2 cr}. The above entropies 
are taken with respect to the state 

£|z><^ 0^4(1*^)^}, (2) 

X 

where 4>^ A is any purification of the state p, meaning that <p^ A is a rank-one density operator 
satisfying Tr^j^^ 4 } = One can think of ^ as the post-measurement state, including both the 

1 Here and throughout this paper, we use superscripts such as A, B, R, and E to denote quantum systems with 
corresponding Hilbert spaces Ha, Hb, Hr, and He- Such a labeling is useful in quantum information theory because 
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classical outcome of the measurement and the subsequent state of the reference system R. The other 
important rate pair corresponds to Shannon's protocol. If no common randomness is available and 
both the sender and receiver are to obtain the measurement outcome, then the lowest achievable 
rate of classical communication is H(X), the Shannon entropy of the distribution of measurement 
outcomes in Q. Time-sharing between these two protocols, converting classical communication to 
common randomness, and wasting common randomness then give all other optimal rate pairs. (See 
Figure [3] for an example plot of the region.) 

Winter's measurement compression protocol has an important place in the constellation of 
quantum Shannon theoretic protocols. It evolved from earlier work in Refs. [46, 66J, and it is the 
predecessor to the quantum reverse Shannon theorem, which was conjectured in Ref. [6] and proved 
later in Refs. (5j H0](^] The quantum reverse Shannon theorem quantifies the noiseless resources 
required to simulate a noisy quantum channel. Since Winter's measurement compression theorem 
applies to a quantum measurement and a quantum measurement is a special type of quantum chan- 
nel with quantum input and classical output, it is clear that the measurement compression protocol 
gives a special type of quantum reverse Shannon theorem. The quantum reverse Shannon theorem 
may seem on first encounter to correspond to a pointless task. After all, in the words of Ref. [6], 
why would we want to dilute fresh water into salt water? First appearances notwithstanding, it has 
at least two nearly immediate and significant information-theoretic applications: in proving strong 
converses (6] [64"1 [5] [10\ |8] and in lossy data compression, otherwise known as rate distortion theory 
\64: \ I44t l4"3" l ll9j. The connection to strong converses follows from a reductio ad absurdum argument: 
if one were able to simulate a channel at a rate larger than its capacity, then it would be possible 
to bootstrap a channel code and a simulation code to achieve more communication than a noiseless 
channel would allow for. With the aid of an appropriate reverse Shannon theorem, one can then 
argue that coding at a rate beyond the capacity should make the error probability converge to 
one exponentially fast in the number of channel uses. The connection to rate distortion theory [7] 
follows from the observation that a reverse Shannon theorem achieves a task strictly stronger than 
the usual average distortion criterion considered in rate distortion theory. There, one requires that 
an information source be represented by the receiver up to some average distortion D > 0. If one 
were to simulate a channel on the information source that does not distort it by more than D on 
average, then clearly such a protocol would already satisfy the demands of rate distortion. 

There are two other useful applications of Winter's measurement compression theorem. The 
first is in local purity distillation [361 E3 1221 HI] , where the task is for two spatially separated parties 
to distill local pure states from an arbitrary bipartite mixed state p AB by using only local unitary 
operations and classical communication. The measurement compression theorem is helpful in de- 
termining the classical communication cost of such protocols, as considered in Ref. [41 j . Another 
application of measurement compression is in realizing the first step of the so-called "grandmother 
protocol" of quantum information theory [26], where the objective is to distill entanglement from 
a noisy bipartite state p AB with the help of noiseless classical and quantum communication. It 

we often deal with states that are defined over many systems. We also use the shorthand <f> = |</>)(</>| to denote a 
pure-state density operator. So, for example, the state (f>f A is shared between systems A and R, implying that <j>^ A 
is an operator acting on the tensor-product Hilbert space Hr <g> Ha- We also freely identify Roman capital letters 
W, X, Y, and Z with both random variables and quantum systems containing only classical data (as in Q). There 
should be no confusion here because these entities are in direct correspondence. 

2 We should clarify here that, while Ref. [51 appeared on the arXiv in 2009, that article contains ideas developed 
and publicized by the authors over a nine year period starting with the publication of Ref. [B] in 2001. Ref. |10j 
features a different proof from that in Ref. [5], but it exploits many of the important ingredients developed in Ref. [5]. 
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is possible to improve upon both of these protocols by exploiting one of the new measurement 
compression theorems that we outline in this paper. 

Once one takes the first step of splitting the implementation of a measurement between Alice 
and Bob, it becomes natural to consider different notions of simulation. What if only Bob needs to 
get the outcome of the measurement, not Alice? What if Bob holds a quantum system entangled 
with the system being measured? These and related variations provide a very precise and diverse 
set of tools for analyzing the dichotomy between noise and information in quantum measurements. 
Beyond providing a detailed review of Winter's theorem, the main contribution of this article will 
be to develop these variations and generalizations of his original theorem. More specifically, our 
contributions are as follows: 

• We provide a full review of Winter's measurement compression theorem, detailing the basic 
information processing task, the statement of the theorem, Winter's achievability proof, and 
a simple converse theorem that demonstrates an optimal characterization of the rate region. 
We also review Winter's extension of the theorem to quantum instruments. 

• We extend Winter's measurement compression theorem to the setting in which the sender is 
not required to receive the outcome of the measurement simulation. Such a task is known 
as a "non-feedback" simulation, in analogy with a similar setting in the quantum reverse 
Shannon theorem [5]. A benefit of a "non- feedback" simulation is that the total cost of 
common randomness and classical communication can be lower than that of a "feedback" 
simulation, leading to interesting, non-trivial trade-off curves for the rates of these resources. 
Also, we prove a single-letter converse theorem for this case, demonstrating that our protocol 
is optimal. 

• We then review Devetak and Winter's theorem regarding classical data compression with 
quantum side information (CDC-QSI) |27j . The setting of the problem is that an information 
source distributes a random classical sequence to one party and a quantum state correlated 
with the sequence to another party. The objective is for the first party to transmit the 
classical sequence to the second party using as few noiseless classical bit channels as possible. 
As such, it is one particular quantum generalization of the classic Slepian-Wolf problem [58]. 
In the Slepian-Wolf protocol, the first party hashes the sequence received from the source, 
transmits the hash, and the second party uses his side information to search among all the 
sequences for any that are consistent with the hash and are a "reasonable cause" for his side 
information. We provide a novel achievability proof for CDC-QSI that is a direct quantization 
of this strategy, replacing the latter search with binary-outcome quantum measurements. We 
also provide a simple converse proof that is along the lines of the standard converses in 
Refs. US EDI • 

• The above reviews of measurement compression and CDC-QSI then prepare us for another 
novel contribution: measurement compression in the presence of quantum side information 
(MC-QSI). The setting for this new protocol is that a sender and receiver share many copies 
of some bipartite state p AB , and the sender would like to simulate the action of many in- 
dependent and identical measurements on the A system according to some POVM A. The 
protocol is a "feedback simulation," such that the sender also obtains the outcomes of the 
measurement (though we still refer to it as MC-QSI for short). The MC-QSI protocol com- 
bines ideas from the measurement compression theorem and CDC-QSI in order to reduce the 
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classical communication rate and common randomness needed to simulate the measurement. 
The idea is that Alice performs the measurement compression protocol as she would before, 
but she hashes the output of the simulated measurement and sends this along to Bob. Bob 
then searches among all the post-measurement states that are consistent with the hash and 
his share of the common randomness, similar to the way that he would in the CDC-QSI 
protocol. The result is a reduction in the classical communication and common randomness 
rate to I(X;R\B) and H(X\RB), respectively, where the entropies are with respect to the 
following state: 

X 

and <p BBA is a purification of the state p AB . These rates are what we would intuitively expect 
of such a protocol — they are the same as in Winter's original theorem, except the entropic 
quantities are conditioned on Bob's quantum side information in the system B. 

• After developing MC-QSI, we briefly discuss three of its applications. The first is an applica- 
tion that two of us announced in Ref. |38j : MC-QSI along with state redistribution |29| I68| 
acts as a replacement for the "grandmother" protocol discussed above. The resulting protocol 
uses less classical and quantum communication and can in fact generate the grandmother by 
combining it with entanglement distribution. As such, MC-QSI and state redistribution form 
the backbone of the best known "static" protocols in quantum Shannon theory (though, one 
should be aware that these results are only optimal up to a regularization, so it could very well 
be that further improvements are possible) . The second application is an observation that the 
above protocol leads to a quantum reverse Shannon theorem for a quantum instrument, that 
is, a way to simulate the action of a quantum instrument on a quantum state by employing 
common randomness, classical communication, entanglement, and quantum communication. 
The third application is an improvement of the local purity distillation protocol from Ref. [JT] , 
so that we can lower the classical communication cost from I(Y;BE) to I(Y; E\B), as one 
should expect when taking quantum side information into account. 

• We then discuss a way that we can relate recent work on entropic uncertainty relations with 
quantum side information [52], [59J [HI [31] to provide a lower bound on the classical resources 
required in two different complementary MC-QSI protocols. 

• Finally, we analyze the MC-QSI problem in the case where the sender is not required to receive 
the outcomes of the measurement simulation. For this non-feedback MC-QSI problem, we 
once again develop optimal protocols and find a single-letter characterization of the achievable 
rate region. While the necessary protocols are simply the natural combinations of those 
used in MC-QSI with those used for non-feedback MC, the optimality proof is different and 
remarkably subtle. 

All the simulation theorems appearing in this paper are "single-letter," meaning that we can 
calculate the optimal rate regions as simple entropic functions of one copy of the state or resource. 
This type of result occurs more often in quantum information theory when the resources considered 
are of a hybrid classical-quantum nature, as is our case here. The single-letter results here mean 
that we can claim to have a complete information-theoretic understanding of the tasks of MC, 
non-feedback MC, CDC-QSI, MC-QSI, and non-feedback MC-QSI. 
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2 Measurement compression 



This section provides a detailed review of the main results in Winter's original paper on mea- 
surement compression [65], and it also serves to establish notation used in the rest of the paper. 
Consider a quantum state p and a POVM A = {A x } x( zx, such that A x > and Yl x ^-x = I- Mea- 
suring the POVM A on the state p induces a random variable X with the following distribution 

px(x): 

px(x) = Tr{A x p}. 

Suppose that a quantum information source outputs many copies of the state p and the POVM is 
performed many times, producing the IID distribution px n (x n ) (where x n = X\X2 • ■ ■ x n ): 

p X n(x n ) = Tr{A xn p® n } 

= Ti{(A Xl ® A^ ® • • • <g> A Xn )(p ® p <g> • • • <g> p)} 

n 

= l[Tr{A Xi p}. 

i=i 

In order to communicate the result of the measurement to a receiver using a noiseless classical 
channel, one could compress the data sequence x n using Shannon compression [15] and communicate 
the sequence x n faithfully by transmitting only nH{X) bits. Such a strategy is optimal if no other 
resource is shared between the sender and receiver. But supposing that the sender and receiver 
have access to some shared randomness (a fairly innocuous resource), would it be possible for the 
sender to simulate the outcome of the measurement using some of the shared randomness and then 
communicate fewer classical bits to the receiver in order for him to reconstruct the sequence x n ? 

The goal of Winter's POVM compression protocol [65J is to do exactly that: accurately simulate 
the distribution produced by the POVM, by exploiting shared randomness. The starting point for 
Winter's protocol is the observation that any POVM A can be decomposed as a convex combination 
of some other POVMs (r( m )} = {{ri m) }}, such that 

A x = Y,PM{m)T x m) - (3) 

m 

This is due to the fact that the set of all POVMs is a convex set^] The set of POVMs {r( m )} then 
provides a simulation of the original POVM A by the following procedure: 

1. Generate the variable M according to the distribution pM(m). 

2. Measure the state p with the POVM T^ M \ 

The resulting distribution for the random variable X, when marginalizing over the random 
variable M, is then as follows: 

^ m (m)Tr{r<>)p} = Trj ^p M (m)r^)p 1 = Tr{A x p} = p x (x). 

m \ m ) 

Thus, the random variable M is a source of noise for simulating the POVM A, and the output X 
represents information. Separating these two components is a useful idea, and it is what allows us 
to simulate a POVM by a protocol similar to the above. 

3 See Ref. [18] for an explicit algorithm that decomposes any non-extremal POVM in this way. 
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Figure 1: Ideal measurement compression. In an ideal protocol for measurement compression, Alice performs 
the POVM A = {A x } on n copies of the state p, which for the i th state leads to a quantum system A[ and a classical 
output Xi. The goal of the protocol is to transmit the classical output X n to a receiver. Doing so perfectly would 
require nlog|A"| bits of communication, where X is the alphabet for the random variable X. Winter's measurement 
compression protocol gives a way of doing so by allowing for a small error but demanding that this error vanish in 
the asymptotic limit of many copies of the state p. The idea is to simulate the measurement in such a way that a 
third party would not be able to distinguish between the true measurement and the simulated one. An assumption 
of this protocol is that the sender obtains the outcome of the simulated measurement in addition to the receiver. 



2.1 Information processing task for measurement compression 

We can now define the information processing task for a measurement compression protocol. Given 
the original POVM A = {A x }, suppose that it acts on an n-fold tensor product state p® n . The 
POVM then has the form 

A® n = {A^l^g^n, 

where 

A x .n =A Xl (g> A X2 ®---<g>A Xn . 

The ideal measurement compression protocol would be for the sender Alice to simply perform this 
measurement on each copy of her state and transmit the classical output to the receiver Bob. 
Figure [T] depicts this ideal protocol. 

Our goal is to find an approximate convex decomposition of the tensor-product POVM of the 
sort in ([3]), but in this case it should have the form: 

A X1 <S> A X2 • • • A Xn « A x n , 

where 

A x - = ^pjtf(m)ry, 
m 

so that each POVM element vffl is a collective measurement on the n-fold tensor product Hilbert 
space. One might expect that such a collective measurement would have some compression ca- 
pabilities built into it, in the sense that it could reduce the number of bits needed to represent 



8 




Figure 2: Measurement compression protocol. The most general protocol for "feedback" measurement com- 
pression that exploits common randomness and classical communication. Alice selects a POVM T'"^ = {T\ ; } 
according to the common randomness M. She then performs this POVM on many copies of the state p, and receives 
an outcome I from it, modeled by the random variable L. She transmits the variable L over log 2 |£| noiseless classical 
bit channels. Bob receives this variable, and by combining it with his share of the common randomness, he can 
reconstruct the output X n of the simulated measurement. In a feedback simulation, the sender also reconstructs a 
variable X' n , which is the output of the simulated measurement. The goal of a feedback measurement compression 
protocol is for the classical outputs of the simulated measurement to be statistically indistinguishable from the output 
of the ideal measurement (this is from the perspective of someone holding both the reference systems and the classical 
outputs). 

the sequence x n . Figure [2] depicts the most general protocol for measurement compression when 
both sender and receiver are to obtain the outcome of the simulated measurement (known as a 
"feedback" simulation) . 

We now make precise the above notion of the approximation of a POVM acting on a source 
state. Suppose that there is some convex decomposition of the tensor-product source p® n as 

p m = Y,PK( k )^, (4) 

k 

where the states a k are generally entangled states living on the n-fold tensor product Hilbert space. 
Thus, one could view the preparation of the source as a selection of a random variable K according 
to pxik), followed by a preparation of the state ok- There is then a joint distribition PK,x n (k, x n ) 
for the selection of the source and the true measurement result: 

PK,x n ( k i xn ) = PK(k)Ti{A x na k }, (5) 

and a joint distribution p K j^(k, x n ) for the selection of the source and the approximation mea- 
surement's result: 

P Kt x^(k,x n ) =p K (k) ^pAf(m)Tr|r^(jfej 

m 

= p K {k)Tr\K x na k \. 
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Definition 1 (Faithful simulation) A sequence of protocols provides a faithful simulation of the 
POVM A on the source p, if for all decompositions of the source of the form in the above joint 
distributions are e-close in variational distance for all e > and sufficiently large n: 



^2 \PK,X" (k,x n )- P K ^i(k, x n ) 



k,x n 



< e. 



(6) 



The following lemma states a condition for faithful simulation that implies the above one, and 
it is the one that we will strive to meet when constructing a protocol for measurement compression. 



Lemma 2 If for all e > and sufficiently large n, it holds that 



< e, 



n \^0J 



(7) 



where u) = p® n ^\ then the measurement simulation is faithful, in the sense that the above inequality 
implies the following one for all decompositions of the source of the form in 



\PK,X" {k, x n ) - P K ^(k, x n ) 

k,x n 



< e. 



Proof. We rewrite the joint distribution px,x™ (k, x n ) in ([5]) as follows: 

PK,x^(k,x n ) = p K (k)Tr{A x na k } 

fojK x n^) (uj- 1 ' 2 p K (k)o- k ^~ 1/2 )} 



Tr 



where we define S k as 



S k = uJ 1/2 PK{k)a k u' 



'1/2 



Observe that the operators S k are positive and sum to the identity on the support of ui. Thus, they 
form a POVM {S k }. Similarly, we can rewrite the joint distribution p K jp;(k, x n ) as 

p K> xz(k,x n ) = Trjv^Ax^Vw S k \- 
So we can rewrite and upper bound the simulation approximation condition in ^ as 



Y^\PK,X n (k,x n ) - p K ^(k,x n )\ 



< 



Eh 

k,x n 

E" 



UJ ( A x .n — A x n ) ^fuj 



1> 



4 The trace norm ||A||i of an operator A is equal to ||^4||i = TrjvCTTA}. The trace distance \\p — a\\i is commonly 
used as a measure of distinguishability between the states p and a because it is equal to 2(1 — 2p e ) where p e is the 
probability of error in distinguishing these states if they are chosen uniformly at random. 
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where the inequality follows from the following chain of inequalities that hold for all Hermitian 
operators r: 



^\Tr{TS k }\=^\Tr{(T + -T_)S k }\ 

k k 

< J2\^{r+S k }\ + \Tr{rS k }\ 
k 

= ^Tr{r + ^}+Tr{r_5 fe } 
k 

= Tr{r+} +Tr{r_} 

In the above, we exploit the decomposition r = r + — r_ , where r+ is the positive part of r and r_ 
is the negative part, and the fact that the operators Sj~ form a POVM. ■ 

We now introduce the quantum-to-classical measurement maps M. A <&n and M.j n , defined as 

M A ®n(a) = ^Tr{A x na}|x n )(x"|, (8) 
M An (c?) = ^TrjA^a}^™)^!, (9) 

x n 

where \x n )(x n \ = |xi)(xi| ® |a?2) {^2! <8> • • • <8> \x n )(x n \ and {\x)} is some orthonormal basis. By 
introducing a purification \<f> p ) of the source p, we can then formulate another notion of faithful 
simulation, as given in the following definition: 

Definition 3 (Faithful simulation for purification) A sequence of protocols provides a faithful 
simulation of the POVM A on the source p, if for a purification \<p p ) of the source, the states on 
the reference and source systems after applying the measurement maps in are e- close in trace 

distance for all e > and sufficiently large n: 

||(id®M A «n)(^ n ) " (id^^O^DIIl ^ £ - ( 10 ) 

In the above, it is implicit that the measurement maps act on the n source systems and the identity 
map acts on the n reference systems. 

One might think that the above definition of faithful simulation is stronger than the condition 
in ([7]), but the following lemma demonstrates that they are equivalent. 

Lemma 4 (Faithful simulation equivalence) The notions of faithful simulation from Lemma^ 
and Definition^ are equivalent, in the sense that 

£|V£(a*»- aV)V^|i = ||(id®"®M A ®«)(^") - (id® n ®AV)(^p n )l|i> ( n ) 

for all states uj = p® n , purifications of p® n , POVMs A® n and A n , and the resulting measurement 
maps M. A ®n and 
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Proof. We can prove this result by considering the single-copy case. Consider a state p, a purifi- 
cation (ftp, and measurements {A x } and {A^}. We choose the purification (j) p to be as follows: 



>/d(yp R ® 

where |$)^ is the maximally entangled state: 



RA 



V<* X 

and {|x)} is an orthonormal basis that diagonalizes p (this basis is not related to the one used in 
|9|). Then the unnormalized state after the measurement on A is equal to 



R 



A x A )\<t> P )(<f> P \ RA ( 



RA I jR 



R 



RA r-R 



(12) 



Given the following "transpose trick" identity that holds for a maximally entangled state (and 
where the transpose is with respect to the basis chosen for |<I>)) 

(I®M)|$) = (M T ®/)|$), 
($|(J®M) = ($|(M*®I), 



we then have that (12) is equal to 



R 



i A )|$)($ 



RA 



A x ' ^p R ®I 



where the rightmost equivalence \/A x * = \/A^ follows because A x is Hermitian. Tracing over the 
A system then leaves the following unnormalized state on the reference system 



(13) 



an observation first made in Ref. 

Now consider that a measurement map id ®Ata has the following action on the purification \(j) p ): 



(id ® M A )(\<t> p )(<f> p \) = Tr^{ {id R ® A A ) (\cf> p ) (<t> p \ RA ) } ® \x) (x\ 

X 

= Y / {Vp a -xVp) r ®\x)(x\ 



X 



\X 



where the last line follows from the conclusion in (13). Thus, we have that 



(id® M A )(cp p ) - (id®7W x 



A' 



A' 
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where the third equality follows because the trace norm of a block-diagonal operator is just the 
sum of the trace norms of the blocks. The fourth equality follows because 



the trace norm depends only on the singular values of a matrix, and these are invariant under 
transposition. ■ 



2.2 Measurement compression theorem 

We can now state Winter's main result: 



Theorem 5 (Measurement compression theorem) Let p be a source state and A a POVM 
to simulate on this state. A protocol for a faithful feedback simulation of the POVM with classical 
communication rate R and common randomness rate S exists if and only if the following set of 
inequalities hold 

R > I(X;R), 
R + S> H{X), 

where the entropies are with respect to the state 

J2\x){x\ X ® r &A{{l R ®A£)(f> RA }, (14) 

X 

and (j) is any purification of the state p. 

Note that I(X; R) and H(X) are independent of the choice of purification (p RA . Moreover, the 
entropies are invariant with respect to transposition in the basis that diagonalizes p so that we 
could instead evaluate entropies with respect to the following classical-quantum state: 

Y)*)(x\ X ®Ti: A {{l R ®{A T x ) A )^ A }. 

X 

Figure [3] provides a plot of the optimal rate region given in the above theorem for this case of a 
feedback simulation in which the sender also obtains the outcome of the measurement simulation. 

After giving a simple example of an application of the above theorem, we prove it in two 
parts. First, we prove that there exists a measurement compression protocol achieving the rates in 
the above theorem, specifically the corner point (S = H(X\R), R = I(X;R)). Next, we prove the 
converse part of the theorem: that one cannot do better than the rates given in the above theorem. 



2.2.1 Examples 

We review two simple examples to illustrate some applications of Theorem [5j Our first example is 
for the case that the initial state on which Alice performs the measurement is some pure state (j) A . 
In this case, the state in ( |14[ ) becomes 

{x\ X ® Tr A { (l R ® A A ) {^ R ® 4> A ) } = Tr{A^}|x) {x\ x iP R . 

X X 
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Figure 3: Optimal rate region for measurement compression with feedback. The figure plots the 
optimal rate region from Theorem [5] The measurement compression protocol demonstrates that the rate pair 
(S = H(X\R), R — I(X; R)) is achievable. Wasting common randomness achieves all of the rate pairs to the right 
of this corner point. Time-sharing between measurement compression and Shannon compression (S — 0, R = H(X)) 
achieves all of the rate pairs between them. Finally, employing Shannon compression and converting the extra classi- 
cal communication to common randomness achieves all of the optimal rate pairs along the line extending northwest 



from Shannon compression. The converse theorem in Section 2.4 proves that this rate region is optimal 



Thus, the reference has no correlations with the outcome of the measurement, so that I(X; R) = 
and H(X\R) = H(X), where H{X) is the Shannon entropy of the distribution p(x) = Tr{A x (p}. No 
classical communication is required in this case — common randomness suffices for this simulation. 
Indeed, the protocol just has Alice and Bob operate as in randomness dilution, whereby they dilute 
their uniform, shared randomness to match the distribution p(x). The idea here is that there are 
no correlations with some reference system, or similarly, there is only a trivial decomposition of the 
form in ([5]), so that K is a degenerate random variable. 

Our next example is a natural one discussed in the conclusion of Ref. [27]. Consider the POVM 

^|0)(0|,i|l}(l|,i|+)(+|,i|-}(-|} (15) 

acting on the maximally mixed state tt a = I A /2. We would like to determine the resources required 
to simulate the action of this measurement on the maximally mixed state. Consider that the Bell 
state 

\$)RA = -LflOO)^ + \U) RA ) = _L(|++)^ + 1 — ) RA ) 
v2 V 2 

is a purification of the maximally mixed state ir A . The post-measurement classical-quantum state 



in (14) for this case is as follows: 
1 



4 



(|0)(0| x ® \0)(0\ R + |l)(l| x ® \l)(l\ R + \2)(2\ x ® \+)(+\ R + |3)(3| x 8) \-){-\ R ). 



A simple calculation reveals that the mutual information I(X; R) of the above state is equal to 
one bit. Also, the conditional entropy H{X\R) of the above state is equal to one bit. Thus, one 
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bit of classical communication and one bit of common randomness are required to simulate this 
measurement. 

For this case, the simulation is straightforward since the original POVM decomposes as a random 
choice of a Z or X Pauli measurement: 

^{|0)(0|,|l)(l|}^{|+)(+|,|-)(-|} 

Thus, Alice and Bob can use one bit of common randomness to select which measurement to 
perform. Alice then performs the Z or X measurement and sends the outcome to Bob using one 



classical bit channel. Bob can then determine which of the four outcomes in (15) has occurred by 
combining the two bits. 

2.3 Achievability proof for measurement compression 

The resource inequality |25[ [26] characterizing measurement compression is as follows: 

I(X;R)[c^c] + H(X\R)[cc] > (A(p)>, 
where the entropic quantities are with respect to a state of the following form: 

® Tr4{ (I R A A )^ RA } = Y,Px(x)\x)(x\ x ® R 

X X 

R ^ A {{l R ®A A )cj> RA }/p x {xl 
p x (x) = Tr{(l R ®A A )d> RA }, 

and 4> RA is some purification of the state p. The operators 9 R take on the following special form 

R = JpKl^p, 

if the spectral decomposition of p is p = J2 X Ag;|a;)(x| and the purification 4> RA is taken as \4>) = 
\^x\ x ) R \x) A . The meaning of the above resource inequality is that nI(X; R) bits of classical 
communication [c —> c] and nH(X\R) bits of common randomness [cc] are required in order to 
simulate the action of the POVM A® n on the tensor product state p® n , and the simulation becomes 
exact in the asymptotic limit as n — > oo. 

The main idea of the proof is to "steer" the state of the reference to be close to the ensemble 
produced by the ideal measurement. In order to do so, we construct a measurement at random, 
chosen from an ensemble of operators built from the ideal measurement A and the state p. By 
employing the Ahlswede- Winter operator Chernoff bound [2], we can then guarantee that there 
exists a particular POVM satisfying the faithful simulation condition in ([3]), as long as the amount 
of classical communication and common randomness is sufficiently large. 

The achievability part of the theorem begins by considering the following ensemble derived from 
the state p and the POVM A = {A;,}: 

px(x) = Tr{A x p}, 

Px = — ^-r^fpAx^Jp. 
px(x) 
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(Recall our statement that the entropies are invariant with respect to transposition in the basis that 
diagonalizes p.) Observe that the expected density operator of this ensemble is just the state p: 



^2px{x)px = \[f>A x Jp = P- 

X X 

We will prove that there exist POVMs r( m ) = {T^} x n eC with m e M, 

|£| = 2 n[I(X;R)+3S]^ ^ 
\ M \ =2 n[H(X\R)+8]^ (17) 

for some 5 > 0, such that the mixed POVM A with elements A x n, = t-^t J2 m ^i™ provides a faithful 
simulation of A on p according to the criterion in ([7]). 

In order to prove achievability, we require the Ahlswede- Winter Operator Chernoff bound, which 
we recall below: 

Lemma 6 (Operator Chernoff Bound) Let £i, . . . ,£m be M independent and identically dis- 
tributed random variables with values in the algebra B{%) of linear operators acting on some finite 
dimensional Hilbert space T~L. Each £ m has all of its eigenvalues between zero and one, so that the 
following operator inequality holds 

Vm G [M] : < £ m < /. (18) 
Let £ denote the sample average of the M random variables: 



1 M 

?=mE^" (19) 



m=l 



Suppose that the expectation E^{^ m } = p of each operator £ m exceeds the identity operator scaled 
by a number a > 0: 

p > aL. (20) 

Then for every rj where < rj < 1/2 and a(l + rj) < X, we can bound the probability that the sample 
average £ lies inside the operator interval [(1 ± rf)p\: 

Pr{(l-7 ? )/i<e<(l + r7)/i}>l-2dim^exp('-^^Y (21) 

Thus it is highly likely that the sample average operator £ becomes close to the true expected oper- 
ator p as M becomes large. 

We first define some operators that we will use to generate the POVM elements For all 



x n G T^ n (where T* n is the strongly typical set — see Appendix [Aj) consider the following positive 
operators with trace less than one: 

£x" = n p,5 n Pa:n,<S Px n H-p x n,8 11^. (22) 
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These operators £' xn have a trace almost equal to one because 



Tr{^„} = Tr{n^ U^ n;S fa U M U n p>s } 

> Tr{n" )( j p xn } - \\p x n - Up xn> s px" n^n^Hi 

> 1 - e - 2y/e. (23) 



The first inequality follows from the trace inequality in Lemma [17] and the second inequality follows 
by appealing to the properties of quantum typicality reviewed in Appendix [A} Also, we set S to be 
the probability of the typical set T? n , and recall that this probability is near to one: 



S = Pv{X n eT s xn }= £ px~{x n )>l-e. 

We define £' to be the expectation of the operators £' xn , when each one is chosen according to a 
pruned distribution px>™(x n ): 

where we define px ,n (% n ) as 

. (24) 

It follows that Tr{£'} > 1 — e — 2-^/e because 

Tr{^}=E^'"(^) Tr {^} 

x n 

> 1 - e - 2^e. (25) 



The inequality follows from the one in (23). From properties of quantum typicality, we know that 

U n pS p® n U n p S > 2-^ H( -P^ U n pS . (26) 

We now define IT to be the projector onto the subspace spanned by the eigenvectors of £' with 
eigenvalue larger than ea, where a = 2^ n ^ H ^ + ^ = 2~ n [- ff ( fl )+ <5 l. Defining the operator as 

n = nfn, (27) 

it follows that Tr{il} > 1 — 2e — 2-^/e because 

rank(ft) < Tr{n} < Tr{n™ 5 } < 2 n ^ +5 l = a~\ 
so that eigenvalues smaller than ea contribute at most e to Tr{f2}, giving 

Tr{^} > (1 - e)Tr{£'} > (1 - e)(l - e - 2y/e) > 1 - 2e - 2^~e. (28) 
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We now exploit a random selection of the operators in (22) in order to build up a POVM that 
has desirable properties that we can use to prove the achievability part of this theorem. Let 
denote the following operators: 

6e™ = n ^' x n n, 

so that we confine them to be in the subspace onto which II projects. Define |£||.M| random 
variables X n (l,m) that are chosen independently according to the pruned distribution px'n(x n ). 
We can group these variables into \A4\ sets C m = {X n (l,m)}i££, according to the value of the 
common randomness m. Under the pruned distribution, the expectation of the random operator 

6r«(z,m) is e q ual to ^ : 

^X"(l,m){£,X«(l,m)} = S ^PX' n {x n )ix n = 

Let E m denote the event that the sample average of the operators in the m th set C m falls close to 
its mean (in the operator interval sense): 

«(l-e)< JTiEe^am) <«(l + e). (29) 
^ i 

The above event E m is equivalent to the following rescaled event: 

A 
\c 

'(l,m) 



\c\ , 

where /? = 2 n ^- H ^ R][X '^ . It is then clear that the expectation of the operators P£ x n (l m \ satisfies the 



following operator inequality needed in the operator Chernoff bound: 

Ex'-{^»(!,m)} =/3n>a(3eU. 

Also, each individual rescaled operator P^ x n (l,m) admits a tight operator upper bound with the 
identity operator because 

PZ*»{i, m ) = 2 n ^ x ^ n n^s p x n u^ njS u; iS n 

< 2 n[H(R\X)-5] 2 -n[H{R\X)-S] jj jjn^ jjn^ n 

= nn£, n^,, n^n 

where we applied the operator inequality Hp n> & p x n Hp x n,6 < 2~ n ^ H( - R \ x ^~^ H-p x n,5 f° r the first 
inequality. Applying the operator Chernoff bound then gives us an upper bound on the probability 
that event E m does not occur 

|£|e 2 (eo/3)' 



Pi{^E m } < 2rank(n)exp 

< 2 • 2 n ^ R ^ exp 

< 2 • 2 n W R W exp 



4 In 2 

2n[I(X;R)+3S] e 32~n[H(R)+S]2n[H(R\X)-8] ' 



41n2 



2 nS e 3 



41n2 

n8 ,?> 



2 6XP ( ~ 4lrx2 + n[H{R) + S] M 2 ) 
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Thus, by choosing \C\ as we did in (16), it is possible to make the probability of the complement of 
E m doubly-exponentially small in n. Also, the above application of the operator Chernoff bound 
makes it clear why we rescale according to /3 — doing so allows for the rescaled operators f3^x n (i,m) 
to admit a tight operator upper bound with the identity (so that the demands of the operator 
Chernoff bound are met), while allowing for \C\ to be as small as 2 n ^^ X ' R ^ +3 ^ and Pr{E^ n } to be 
arbitrarily small. 

We now define a counting function c x n(£,A4) on the sets £ and Ai, which counts the fraction 
of occurrences of a sequence x n € Tf n in the set {x n (l, w.)}ie£, m ex : 

c x n(C,M) = — 1— : x n (l,m) = x n }\. 

(This is effectively a sample average of the counts.) When choosing the random variables X n (l, m) 
IID according to the pruned distribution, the expectation of the random counting function C x n (£,A4) 
is equal to the probability of the sequence x n : 

E{C x n(£,M)}=p x >n(x n ). 

Thus, for any sequence x n 6 T* n , the expectation of the counting function has the following lower 
bound: 

E{C x n(£,M)}=p x >n(x n ) 
= ^Px^(x n ) 

> mm{ PX n(x n ) : x n £ Tf) 

> 7 = 2 - n i H ( x )+ s \ 

where S, recall, is the probability that a random sequence X n is typical. 

In order to appeal to the operator Chernoff bound (we could just use the classical one, but 
we instead choose to exploit the operator one), we define P as a diagonal density operator of 
dimension Tj | x \Tg , whose diagonal entries are just the entries of the pruned distribution 
Px ,n {x n ). Similarly, for a particular realization of the set {x n (l, m)}i^c,m&Mi we can define C as a 
diagonal density operator of the same dimension, whose diagonal entries are just the entries of the 
counting functions c x n(£, A4). From the above reasoning, it is then clear that the expectation of 
C under a random choice of the x n (l, m) sequences is just P: 

e{c} = P. 

Furthermore, (again by the above reasoning), we can establish the following lower bound on P: 

where II" $ is a typical projector corresponding to the distribution px{x). 
Let -Eo be the event that the operator C is within e of its mean P: 

(l-e)P< C< (l + e)A (30) 
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By appealing to the operator Chernoff bound, we can bound the probability that the above event 
does not occur when choosing the sequences x n (l,m) randomly as prescribed above: 



Pr{-£ } < 2rank(D£ X) j)exp 
= 2 , 2 nlH(x)+8] expj 

= 2 • 2 n ^ x ^ exp 



\C\\M\e 2 ^ 
41n2 

2n[I(X;R)+3S]2n[H(X\R)+S] e 22-n[H(X)+8] 



4 In 2 



•>n38 



41n2 



2 exp 



in3<5 



4 In 2 



+ n[H(X) +5] In 2 . 



Thus, by choosing |>C 1 1 | as we did in ( 16|[T7 ), it is possible to make this probability be doubly- 
exponentially small in n. 

We want to ensure that it is possible for all of the events E m and E$ to occur simultaneously. 
We can guarantee this by applying DeMorgan's law, the union bound, and the above estimates: 



PW -, 



E n ffl^j 

= Pr|^ U 

<Pr{-£ } + ^Pr{-£ m } 



< 2 exp 



•i35 



41n2 



+ n[H(X) + 5] In 2 + |.M|2exp 



2 n5 e 3 
41n2 



+ n[H(R) + 5] In 2 



(31) 



which becomes arbitrarily small as n — > oo. (Thus, it is in fact overwhelmingly likely for our desired 



conditions to hold if we choose \£\ and \A4\ as we did in (16 17). 



So, assume now that we have a set {x n (l,m)}i e c meM such that the corresponding operators 
{£x n {i,m)} an d C satisfy the conditions in (29) and (30). We can now construct from them a set 
of POVMs jr( m )| that will perform a faithful measurement simulation. We define the POVM 
elements rffl of as follows: 



r (m) 



1 + e 



-co 



'1/2 



\C\ 



(32) 



l : x n (l,m)=x n 



S \{l:x n (l,m)=x n }\ _ 1/2 



1 + e 



-1/2 



We check that for each value m of the common randomness that these operators form a sub-POVM 
(a set of positive operators whose sum is upper bounded by the identity). Indeed, we can appeal 
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to the fact that the operators satisfy the condition in ( 29 ) : 



E 1 ' 

x n ex n 



(m) 



_S_ 1_ 

1 + e|Z| 

_S 1_ 

1 + eLcf 



E E €x n (l,m) 



x n eX n \l : x n {l,m)=x n 



(i,m) 



< 



5 



1 + e 
5 0, 



0(1 + e) 



where the inequality appeals to (|29|). Continuing with the definition of O in (27), we have 



= n n£, P ®» n 

< p 0n = w. 

The first inequality follows from the operator inequality n^. n ^ fen 5 < fen (the projectors 
np xni 5 are defined with respect to the eigenbasis of fen). We can then conclude that these operators 
form a sub-POVM because 



- E r - 



W < UJ 



E r - * J - 

x n ex n 



By filling up the rest of the space with some extra operator 



r 



(to) 



'- £ 1 



(to) 



we then have a valid POVM. 

Note that we have chosen the measurement operators I^n so that there is a correspondence 
between a sequence x n and a measurement outcome. In the communication paradigm, though, we 
would like to have the measurement output some index I that Alice can send over noiseless classical 
channels to Bob, so that he can subsequently construct the sequence x n (l,m) from the value of I, 
the common randomness m, and the codebook {x n (l, m)}. So, for the communication paradigm, 
we can also consider the measurement to be of the form 



T 



(to) 



S 



x n (l,m) w 



-1/2 



1 + e |£| 

The POVM in (32) then just results by computing x n from the codeword x n (l,m). 



(33) 
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We define the operators A x n as 



A - 1 \^r (m) 



or equivalently as 



where X(x n = x n (l, m)) is an indicator function. We now check that the constructed POVM satisfies 
the condition in for a faithful simulation: 



El 



E 



5 |{Z,m:a"(Z,m)=a:"}| , 



1 + e 



E \\PX»(x n )Mi + E 



\L\\M\ 

S \{l,m:x n (l,m)=x n }\ 
PX" [X )Px" ~ \ — ; — ~_ T^TTTTi 



1 + e 



< e 



+ E 



PX"(x n )p x n -px"(x n )£, x « +px"(x n )^x 



\£\\M\ 

S \{l,m : x n (l,m) = x n }\ 



1 + e 



|£||M| 



The third equality above follows because the operators £ x n are defined to be zero when x n (£ Tf\ 
Then, the bound in the last line follows from typicality (Pr{X n £ Tg} < e). Continuing, we 
upper bound as 



<e+ ^ PX*i(x n )\\p x n - ^n||l + ^ 



5 |{Z,m : x n (/,m) = 



< e 



+ E 
+ E 



PX"(x n ) 
S 



Px^jx" 
S 



\px n — ix n Hi + 



p 



- S PX^)- 



-c 



1 + e \C\\M\ 
1 \{l, m : x n (l, m) = x n }\ 



|£||^| 



1 + e 



The first inequality is the triangle inequality, and the second inequality follows by dividing the 
rightmost two terms by S. The equality follows by invoking the definitions of the operators P and 
C . We handle these two remaining terms individually. Consider that 







1 


P- 1 c 


l — 


1 + e 


1 + e 



l + e)P-C 



< 



1 



1 + e 

2e 

< 

~ 1 + e 

< 2e, 



eP 



+ 



P-C 



(34) 
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which follows from the triangle inequality, the fact that P is a density operator, and that P and C 
satisfy (30). Consider the other term: 

PX"(x n ) 



S 



Px n — £x n ||l 



x n £T? n 



x n eT^ n 



< 2vV + 2Ve", (35) 

where we apply the triangle inequality in the third line. For the first bound with e', we apply the 
Gentle Operator Lemma (Lemma 15) to the condition in (23), with e' = e + 2y/e. For the second 
bound with e", we exploit the equality £ x n = n^„II and apply the Gentle Operator Lemma for 
ensembles (Lemma |16[) to the condition 

Px-(z")Tr{n^n}= Y, PX*(x n )Tt{&n} 

= Tr{n} 
>l-e", 



which we proved before in (28) (with e" = 2e + 2y/e). This concludes the proof of the achievability 
part of the measurement compression theorem with feedback. 



2.4 Converse theorem for measurement compression 

This section provides a proof of a version of the converse theorem, which states that the only 
achievable rates R and S of classical communication and common randomness consumption, re- 
spectively, are in the rate region given in Theorem [5] We note that Winter proved a strong version 
of the converse theorem [65J , which states that the error probability converge exponentially to one 
as n becomes large. Winter's strong converse implies that the boundary of the rate region in The- 
orem [5] is a very sharp dividing line. Here, for the sake of simplicity, we stick to the proof of the 
"weak" converse, which only bounds the error probability away from zero. The reader can consult 
Section IV of Ref. [65] for details of Winter's strong converse proof. 

The converse theorem states that the "single- letter" quantities in the rate region in Theorem [5] 
are optimal. A nice consequence is that there is no need to evaluate an intractable regularization 
of the associated region, as is often the case for many coding theorems in quantum Shannon theory 
|60| . The theorem truly provides a complete understanding of the measurement compression task 
from an information-theoretic perspective. 

We now prove the weak converse. Figure [2] depicts the most general protocol for measurement 
compression with feedback, and it proves to be useful here to consider a purification of the original 
input state. The protocol begins with the reference and Alice possessing the joint system R n A n 
and Alice sharing the common randomness M with Bob. She then performs a simulation of the 
measurement, outputting a random variable L and another random variable X ln that acts as the 
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measurement output on her side. She sends L to Bob, and Bob produces X n from L and the 
common randomness M. If the protocol is any good for measurement compression with feedback, 
then the resulting state uj r " x " x " should be e-close in trace distance to the ideal state (j Rnx " x 
(the state resulting from the ideal protocol in Figure [TJ, where X n is a copy of the variable X n : 

- a Rnxn *" ! < e. (36) 

We now prove the first lower bound on the classical communication rate R: 

nR > H{L) 

> I(L; MR n ) 

= I(LM; R n ) + I(L; M) - I{R n ; M) 

> I(LM; R n ) 
>I(X n ;R n ) w 

> I(X n ;R n ) a -ne' 
= nI(X;R) - ne'. 

The first inequality follows because the entropy of a uniform random variable is larger than the 
entropy of any other random variable. The second inequality follows because I(L; MR n ) = H(L) — 
H(L\MR n ) and H(L\MR n ) > for a classical variable L. The first equality is an easily verified 
identity for mutual information. The third inequality follows because the common randomness 
M is not correlated with the reference R n (and hence I(R n ;M) = 0) and because I(L;M) > 0. 
The fourth inequality is from quantum data processing (Bob processes L and M to get X n ). The 



fifth inequality is from (36) and continuity of quantum mutual information (the Alicki-Fannes' 
inequality |3j), where e' is some function /(e) such that lim^o /(e) = 0. The final equality follows 
because the ideal state a is a tensor-power state, and thus the mutual information /(X n ;i? n ) - is 
additive. 

A proof for the lower bound on the sum rate R + S goes as follows: 

n(R + S)> H(LM) 

> I(X' n ;LM) 
>I(X ,n ;X n )„ 

> l(X n ; X n ) a — ne 
= H(X n ) - ne' 

= nH(X) - ne'. 

The first two inequalities follow for the same reasons as the first two above (we are assuming that 
copies of L and M are available since they are classical). The third inequality is quantum data 
processing. The fourth inequality follows from ( |36[ ) and continuity of entropy. The first equality 
follows because the mutual information between a variable and a copy of it is equal to its entropy. 
The final equality follows because the entropy is additive for a tensor power state. 

Optimality of the bound R + S > H(X) for negative S follows by considering a protocol 
whereby Alice uses classical communication alone in order to simulate the measurement output X n 
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and generate common randomness M with Bob. The converse in this case proceeds as follows: 

nR > H(L) 
= I{L;L) 

> I(X' n M';X n M) 

> l(X n M; X n M) - nd 

= l(X n ; X n ) + I(M; M) - ne' 
= nH(X) + n\S\-ne'. 

The second inequality follows because Bob and Alice have to process L and its copy L in order to 
recover the approximate X n M and X' n M' , respectively. The third inequality follows because these 
systems should be close to the ideal ones for a good protocol (and applying continuity of entropy). 
The next equalities follow because the information quantities factor as above for the ideal state. 



2.5 Extension to quantum instruments 

We now briefly review Winter's argument for extending the above protocol from POVMs to quan- 
tum instruments. A quantum instrument is the most general model for quantum measurement 
that includes both a classical output and a post-measurement quantum state [20^ l2Tj 08] . Our goal 
is now to simulate the action of a given quantum instrument on many copies of an input state p 
using as few resources as possible. The simulation should be such that Bob possesses the classical 
output at the end of the protocol (as in the case of POVM compression), and, as an additional 
requirement, Alice possesses the quantum output. 

In the present setting, we can conveniently treat a quantum instrument as a completely positive, 
trace-preserving (CPTP) map A/instr of the form 

Mnstr(p) = ^2^x{p) ® \x)(x\, (37) 

x 

where each M x is a completely positive, trace-non-increasing map of the form 

v 

such that 

y 

The simulation, implemented by a sequence of maps A/^L t , is defined to be faithful if the following 
condition holds: 

Definition 7 (Faithful instrument simulation) A sequence of maps A/j™ tr provides a faithful 
simulation of the quantum instrument A/i ns tr on the state p if for all e > and sufficiently large n, 
the action of the approximation channel on many copies of a purification <p p of p is indistinguishable 
from the true quantum instrument, up to a factor of e: 

(id ® AC r ) « n ) " (id ® «") i ^ e - ( 38 ) 
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In this case, we have the following theorem: 

Theorem 8 (Instrument simulation) Let p be a source state and A/i ns tr an instrument to simu- 
late on this state. A protocol for a faithful feedback simulation of A/instr with classical communication 
rate R and common randomness rate S exists if and only if 

R > I(X;R), 
R + S> H(X), 

where the entropies are with respect to a state of the following form: 

Y,\x)(x\ X ®T*a{(I R ® N*) (O } , (39) 

X 

and cj) RA is some purification of the state p. The simulation is such that Alice possesses the quantum 
output of the channel and Bob possesses the classical output. 

Proof. We just prove achievability because the converse theorem from the previous section applies 
to this case as well. We start by considering the case in which every map M x can be written as 

M x {p) = N xP Nl (40) 

(The general case, stated in Section V-G of Ref. [26] though lacking a formal proof, will also be 
addressed.) 

We construct an approximation instrument using the protocol in the achievability proof from 
Section l2~3l Let us set 

A, = NlN x , 



and construct the operators rffl from A x and the state p as in (32), such that they satisfy all of 
the properties that we had before. Define the distribution px(%) and the states p x as we did before: 

px(x) = Tr{A x p}, 

Px = — X —^Jpk x Jp. 

We construct the approximation instrument A/^ str from px(x), p x , and the Kraus opera- 

tors N x . First, consider that the approximation instrument A/J" t will be a convex combination of 
some other instruments: 

^( ff ) = i^E*£5L(*). ( 41 ) 

where 

f 2» = E#^ )f ® i* n x* n i> Y^f^f^ < i. 

We now construct the operators F^„ . Define the conditional distribution pj^;^ M (x n \m) as follows: 



p^ M {x n \m) = —A{1 : x n {l,m) = x n }\, 
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as in (32), and let PM^m) = 1/|.M|, so that the marginal distribution pj^(x n ) is as follows: 



1 



\C\\M\ 



\{l,m: x n (l,m)=x n }\. 



Take a left polar decomposition of the operator N xy fp and use it to define the unitary operator U x : 

N xy fp = U x \j 'y/pNlN X y/p = U x y/p x (x)p x . (42) 

Let U x n be as follows: 

U x n = U Xl ® • • • ® C4 n . 
We define the Kraus operators F x ™^ for the instruments £^ tr as follows: 



£ x n[U} 



(43) 



One can check that these define completely positive trace-non-increasing instruments £^l r — this 
follows from the fact that the operators r)j,n in (32) form a sub-POVM. The instruments £^) T in 
turn form the instrument M n i ns tr via the relation in (41). 



We can now check that this construction satisfies the condition in (38) for a faithful simulation. 
Consider a purification 

<f>f n = (l®^)\I)(I\(l®Vu>), 
where |J) is the vector obtained by "flipping the bra" of the identity channel ^z"l xn )( x " l : 

ii) = 5>»>i*»). 

We have the following bound on the instrument simulation performance: 
||(id^^)(C)-( id ^^)(C 

(/ ® N x nyfa) \I){I\ (/ ® ^Nl) - -L ^(j ® #V^) |/)(/| (/ ® V^i^) 

' ' m 

p X " (X n ) (I ® t^n VA^) 1 1) (I\ (i ® y/p^U^n ) - 



< Yl \\p xn ^ y 1 ® v^») i J ) ( J i ( J ® v^») - (x n ) (i ® x/e^j |/) (/| (i ® 

+ £ ^ n ( 7 ® v^) 1-00*1 ( J ® v 7 ^) - ^(^)t^ ( 7 ® v^) 1-00*1 ® 



i- 
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The first equality follows from the block structure of the instruments with respect to the classical 



flags \x n ){x n \. The second equality follows by substituting the polar decomposition in (42) and the 



last inequality is the triangle inequality. Continuing, we have 



definition in (43). The third equality follows from the unitary invariance of the trace norm. The 



< 



x n 

+£ 



Px«(x n ) -pj^{x n ) 



pxA\i){i\(i ® \fp^\ - (i 1 

,5' 



6r n 



1 + e 



PX™{x n )y\\px- 



i" i 



+ 2e 



< 2y/2 J^p Xn {x n )\\p x n -ixA\i + 2e 



< 2V^V e + 2\Q + 2^7' + 2e. 

The first inequality follows by factoring out the distribution px^(x n ) and because the positive 
operator (/(g) y/£x n )\I){I\ (I <8> V£x n ) nas trace less than one. The second inequality follows from 
Winter's Lemma 14: 



< 



\Px n — £,x n 



and our previous bound in (34). The third inequality is from concavity of the quartic-root function, 



and the final one follows from our previous bounds in (35) and the fact that the probability mass 
of the atypical set is upper bounded by e. 

We are now ready to consider the general case, in which one wants to simulate an instrument 



of the form (37). For this purpose, we require a slightly different coding strategy that combines 



ideas from Section 2.3 and the above development. 



First, consider that it is possible to implement a quantum instrument of the form in (37) by 
tracing over an auxiliary register Y: 

Mnstr(p) = Tr Y \j2 N *,yp N iy ® \ x )( x \ x ® \y)(y\ Y f • 



So, Alice and Bob will simulate the following instrument 

^ •V„ ;// ;.Y,, / ® \x)(x\ x \y)(y\ Y , 



(44) 



x.y 



in such a way that Bob does not receive the outcome y, and thus they effectively implement the 
instrument A/i ns tr- The idea is that they will exploit a code with the following structure: 

1. Alice communicates nI(X; R) bits of classical communication to Bob (enough for him to 
reconstruct the x output). 



2. Alice keeps nI(Y; R\X) bits of the output to herself. 
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3. Alice exploits nH(X\R) bits of common randomness shared with Bob in the simulation. 

4. Alice uses nH(Y\XR) bits of local, uniform randomness (not shared with Bob). 

The entropies are with respect to the following classical-quantum state: 

£>a{(/* ® (Nl y N x>y ) A ) ® \x)(x\ x \y)(y\ Y , 



and observe that I(X; R) and H{X\R) are invariant with respect to the choice of Kraus operators 
{N x ,y} for each map J\f x - 

More precisely, the measurements used in the simulation are chosen randomly as in the proof in 



Section 2.3 with the following modifications. Choose |£i||.Mi| codewords x n (l\,mi) independently 



and randomly according to a pruned version of the distribution 

p x (x) = Tr{N x (p)} =J2 Tc { N l,v N *,vP}> 

y 

with 

| A | „ 2 nI(X;R) i 
2 nH(X\R)_ 

For each pair (!i,mi), choose l/^H-A"^! codewords y n (h, m 2\h, m i) independently and randomly 
according to a distribution: 

p Y >n\ X n(y n \x n (h,mi)), 
which is a pruned version of the conditional distribution 



PY\x(y\x) = — j-rTr{Nl y N x>y p}, 



Px{x) 
where 

| £2 | ^ 2 nI(Y;R\X)^ 

\M 2 \ » 2 nH ( y l Xi? ). 

After choosing these codewords, we have a codebook {x n (£i, mi), y n (h, ^2^1) m i)}- Divide all 
of these codewords into |A4i||A^2| sets of the form {x n (li, mi), y n (h, m2\h, m i)}h,i 2 - ^ n order to 



have a faithful simulation, we require several conditions analogous to ( 29 ) and (|30|) to hold (except 



that the first average similar to that in (29) is over just l\ and there is another over both l\ and 
I2, and the other operators like C in (30) are with respect to both l\ and mi and all of l±, I2, mi, 
and m,2). Choosing the sizes of the sets as we do above and applying the Operator Chernoff Bound 
several times guarantees that there exists a choice of the codebook {x n {l\, mi), y n (l2, m2\h, mi)} 
such that these conditions hold. By the development at the end of Section |2.3| and the result for 



instruments of the special form (40), it follows that these conditions lead to a faithful simulation. 

The simulation then operates by having the variable mi be common randomness shared with 
Bob, m-2 as additional local, uniform randomness that Alice uses for picking the measurement, and 
all of the measurements have outcomes l\ and 1%. After performing the measurement simulation, 
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Alice sends the outcome l\ to Bob, which he can subsequently use to reconstruct the codeword 
x n (h,mi) by combining with his share mi of the common randomness. The proof as we had it 
before goes through — the only difference is in constructing the codebook in such a way that the 



sequences x n and y n are separated out. The simulated instrument has a form like that in (44), and 



if Alice discards y n , it follows, by applying the monotonicity of trace distance to the condition in 



(38), that Alice and Bob simulate the original instrument. ■ 

As a closing note for this section, we would like to mention that the quantity I(X; R), appearing 
in Theorem [8j has a long history. Since I(X; R) measures the amount of data created by the 
quantum measurement, contrarily to the shared randomness that exists before the measurement 
itself, it seems natural to consider it as a measure of the information gain produced by the quantum 
measurement. In this connection, the 1971 paper of Groenewold was the first to put forward the 
problem of measuring the information gain in "quantal" measurements by means of information- 
theoretic quantities [33] . Groenewold considered the following quantity (reformulated according to 
our notation): 

G^Mnstr) := H{p) - Ypx(x)H{N x (p)/px(x)}, 



/ J 1 

X 



and he conjectured its positivity for von Neumann-Liiders measurements. Keep in mind that, at 
that time, the theory of quantum instruments was in its infancy, and the von Neumann-Liiders 
state reduction postulate, according to which the initial state is projected onto the eigenspace 
corresponding to the observed outcome, was the only model of state reduction usually considered. 
Subsequently, Groenewold's conjecture was proved by Lindblad [42J. As the theory of quantum 
instruments advanced [48], quantum instruments with negative Groenewold's information gain 
appeared to be the rule, rather than the exception, until Ozawa finally settled the problem by 
proving that G(p, A/i ns tr) is nonnegative for all states p if and only if the quantum instrument has 



the special form in (40) [49|. 



The point is that, for quantum instruments of the form in (40), Groenewold's information gain 
G(p, A/instr) is equal to I(X; R) [P2] . This is a consequence of the fact that, for any matrix K, K^K 
and KK^ have the same eigenvalues (i.e., the squares of the singular values of K), so that 

H{N x {p) / p x (x)} = H{N xP Nl / Px {x)) = H{^-pNlN x ^-p/p x {x)). 

This coincidence retroactively strengthens the interpretation of G(p, A/i ns t r ) as the information gain 



due to a quantum measurement, at least in the special case of instruments satisfying (40). In those 
cases, due to Winter's measurement compression theorem, G(p, A/i ns tr) truly is the rate at which 
the instrument generates information. More generally, however, I(X; R) is the better measure of 
information gain both because it is nonnegative and because it always has the full strength of 
Winter's theorem behind it. 



2.5.1 Application to channels 

As already noticed in [65], with Theorem [8] at hand, it is easy to consider the case in which one 
wants to simulate the action of some CPTP map N on many copies of the state p. The idea is 
that, for every Kraus representation [40] of the map M as 

tf(a)=Y,N x aNt, (45) 

X 
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where N x are a set of Kraus operators satisfying 



X 

one can apply Theorem [8] and simulate the corresponding quantum instrument 

Moattip) = ^N xP Nl ®\x){x\. 

x 

Then, any protocol faithfully simulating the above instrument automatically leads, by monotonicity 
of trace distance, to a faithful simulation of the channel Af, in the sense that it provides a sequence 
of maps Af n such that: 



[id ® N® n ) [<t)f n ) - (id ® ^) (0® n ; 



i <e, 



for any e > and sufficiently large n. 

An important thing to stress is that the rates obtained in this way depend on the particular 
Kraus representation used to construct the instrument Mnstr- The rates of consumption of clas- 
sical resources can hence be minimized over all possible Kraus representations of a given channel. 
However, such an optimization turns out to be difficult in general, as the following example shows. 

Let us consider the case of a channel, which can be written as a mixture of unitaries, i.e., 

N{p) = Y J P{x)UxpUl (46) 

X 

where UxU x = I. Such a channel can be simulated without the need for classical communication. 



This follows simply from the fact that the quantum instrument constructed from ( 46 ) corresponds 
to measuring the POVM A x = p(x)I, whose outcomes are completely random and uncorrelated 
with the reference, so that I(X;R) = 0. In fact, the converse is also true: if a given channel 
admits a Kraus decomposition for which I(X; R) = 0, then its action on the state p can be written 



as a mixture of unitaries as in (46) In order to show this, suppose that we find a Kraus 

decomposition N(p) = Yl x -^pAisuch that the quantum mutual information I(X; R) = 0, where 
it is calculated with respect to the following classical-quantum state 



x)(x\ x ® Tr A {(I R ® NlN^)4> RA ], 



and 4> RA is a purification of p. Adopting the same notation used at the beginning of Section 2.3 
we know that I(X;R) = if and only if the sub-normalized states 9 R = ^/p(NxN x ) T \fp are all 
proportional to p. This is possible if and only if the operators (NxN x ) T are all proportional to the 
identity (on the support of p), thus proving the claim. 

Hence, as the above example shows, to minimize the rate of classical communication needed to 
simulate a quantum channel constitutes a task of complexity comparable to that of deciding whether 
a given channel possesses a random- unitary Kraus decomposition or not, for which numerical 
methods are known [3] but a general analytical solution has yet to be found. 



31 



3 Non-feedback measurement compression 



We now discuss an extension of Winter's measurement compression theorem in which the sender 
is not required to obtain the outcome of the measurement simulation (known as a "non-feedback" 
simulation). Achieving a feedback simulation is more demanding than one without feedback, so 
we should expect the non-feedback problem to show some reduction in the resources required. To 
get a sense of where the improvement comes from will require considering a more general type of 
POVM decomposition than that in Suppose that it is possible to decompose a POVM {A x } in 
terms of a random selection according to a random variable M, an "internal" POVM {T^} with 
outcomes w, and a classical post-processing map Px\w{ x \ w ) [33 EE]: 

A. = Y,Pm(™)tW Px]w (x\w). (47) 

m,w 

In that case, Alice and Bob could proceed with a protocol along the following lines: they use 
Winter's measurement compression protocol to simulate the POVM { s ^2 m PM{'fn)^ < w l ' > }w and Bob 
locally simulates the classical postprocessing map Px\w( x \ w )- (This is essentially how a "non- 
feedback" simulation will proceed, but there are some details to be worked out.) 

We should compare the performance of the above protocol against one that exploits a feedback 
simulation for {A^}. The classical communication cost will increase from I(X;R) to I(W;R) 
(the data-processing inequality I(W;R) > I(X;R) holds because W is "closer" to R than is 
X), but the common randomness cost will be cheaper because the non- feedback protocol requires 
only nI(W; X\R) bits of common randomness rather than nH(X\R) bits (essentially because Bob 
can find a clever way to simulate the local map Px\wi x \ w ))- Thus, if the savings in common 
randomness consumption are larger than the increase in classical communication cost, then there is 
an advantage to performing a non-feedback simulation. In general, decomposing a POVM in many 



different ways according to (47) leads to a non-trivial curve characterizing the trade-off between 
classical communication and common randomness. 

In this connection, it is important to remark that the decomposition (sometimes referred to as 
a refinement) of a POVM according to the post-processing relation: 

A X = J2 ^wPx\w(x\w), (48) 

w 

of which (47) is a special case, is different from the convex decomposition described in ([3]). In par- 
ticular, while the conditions for a POVM to be extremal (i.e., not non-trivially decomposable) with 
respect to ([3]) are known to be rather involved [TS], it turns out that POVMs which are extremal 



with respect to (48) can be neatly characterized as those (and only those) whose elements are all 
rank-one operators [35]. Hence, if the POVM that Alice and Bob want to simulate is rank-one 
(i.e., all its elements are rank-one operators), then there is nothing to gain from implementing a 
non-feedback simulation instead of a feedback simulation. Notice, however, that the two decompo- 
sitions Q and (48) are completely independent: POVMs which are extremal with respect to Q 



need not also be extremal with respect to (48), and vice versa |13| . This is the reason why there 
is plenty of room for non-trivial trade-off relations between classical communication and common 
randomness if the POVM to be simulated is not rank-one. 

Theorem [9] below gives a full characterization of the trade-off for a nonfeedback measurement 
compression protocol, in the sense that the protocol summarized above has a matching single-letter 
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converse proof for its optimality. Thus, we can claim to have a complete understanding of this task 
from an information-theoretic perspective. 

We should mention that some of the above ideas regarding non- feedback simulation are already 
present in prior works [261 021 E], and indeed, these works are what led us to pursue a non-feedback 
measurement compression protocol. In Ref. [26J, Devetak et al. observed in their remarks around 
Eqs. (43-45) of their paper that a protocol in which the sender also receives the outcomes of 
the simulation is optimal, but "examples are known in which less randomness is necessary" for 
protocols that do not have this restriction. They did not state any explicit examples, however, 
nor did they state that there would be a general theorem characterizing the trade-off in the non- 
feedback case. Cuff's theorem |17j regarding the trade-off between classical communication and 
common randomness for a non- feedback reverse Shannon theorem is a special case of Theorem [9] 
below, essentially because a noisy classical channel is a special case of a quantum measurement and 
thus the simulation task is a special case. Ref. (5] characterized the trade-off between quantum 
communication and entanglement for a non-feedback simulation of a quantum channel. Thus, 
Theorem [9] below "sits in between" the communication tasks considered in Ref. |17j and Ref. [5J. We 
should also remark that Ref. [5] stated that it is possible to reduce the common randomness cost in 
the non-feedback reverse Shannon theorem either with randomness recycling or by derandomizing 
some of it, and we should be able to employ these approaches in a non- feedback measurement 
compression protocol. Though, our approach below is to modify Winter's original protocol directly 
by changing the structure of the code. 



3.1 Non-feedback measurement compression theorem 

Theorem 9 (Non-feedback measurement compression) Let p be a source state and N a 
quantum instrument to simulate on this state: 



N{p)=^N x {p)®\x){x\ 

x 



A protocol for faithful non-feedback simulation of the quantum instrument with classical communi- 
cation rate R and common randomness rate S exists if and only if R and S are in the rate region 
given by the union of the following regions: 

R > I(W;R), 
R + S> I(W;XR), 

where the entropies are with respect to a state of the following form: 

52px\w(x\w)\w){w\ w ® \x)(x\ x Tr A { (l R M^) « A ) }, (49) 

x,w 

(j) RA is some purification of the state p, and the union is with respect to all decompositions of the 
original instrument N of the form: 

M(j>) = J2px\w(x\w)M w (p) ® \x){x\ x . (50) 



Observe that the systems R, W , and X in (49) form a quantum Markov chain: R — W — X . 

The information quantity I(W; XR) appearing in the above theorem generalizes Wyner's well- 
known "common information" between dependent random variables [67J . 
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3.2 Achievability for non-feedback measurement compression 

We now prove the achievability part of the above theorem. Suppose for simplicity that we are 
just trying to simulate the POVM A = {A x } where each A x is a positive operator such that 
A x = YlwPx\w( x \ w )M w and each M w is a positive operator. The case for a general quantum 
instrument follows by considering this case and by extending it similarly to how we extended POVM 
compression to instrument compression in Theorem [8j So, the relevant overall classical-quantum 
state to consider when building codes for a non-feedback simulation is as follows: 

^Px\w(x\w)Tr A {M£(j)f A } ® \w)(w\ w ® \x)(x\ x , 



which simplifies to 



^2px\w{ x \ w ) Vf> M wVf> ® \w){w\ w ® \x)(x\ 



x 



after realizing that Tr^M^<pp } = ^fpM w ^fp (in the above and what follows, we ignore the 
transpose in the eigenbasis of p on M w because it is irrelevant for the result). Let r denote the 
state obtained by tracing over the W register of the above state: 

w 

where the classical state a x is as follows: 

a w = ^2px\w(x\w) \x)(x\ x . 

X 

Consider the following ensemble: 

pw{w) = Tr{M w p}, 

Pw = — \—;\[pM w ^[p. 
pw{w) 

Observe that p is the expected state of this ensemble: 

^2pw(w)p w = ^2 Vp m wVp = P- 

w w 

Also, the state r is as follows: 

r = ^pw{w)pw ® cr X - 

w 

Our approach is similar to Winter's approach detailed in Section [2 .3[ choose |£||.M| codewords 
w n (l,m) according to the pruned version of the distribution pwn(w n ). As long as 

|£| ^ 2 nI ( W ' R ) 
\M\ ^2 nI ^ x \ R \ 
\L\\M\^2 nI ^ w ' XR \ 



34 



the operator Chernoff bound (Lemma [6]) guarantees that there exists a choice of the codewords 
w n (l,m) such that the following conditions are true: 

jPfE^W 6 (52) 

l,m 

where each p' wn is a typical projected version of p w n = p Wl ® • • • <g) p tUn : 

= n n™ 5 n / g mnj(5 p^n n^ n)5 n™ 5 n. 

In the above, n^„.j is the conditionally typical projector for p w n, n™ 5 is the average typical 
projector for p® n , and IT is the eigenvalue cutoff projector as before. We define each K w nn m \ as a 
typical projected version of the state p w ?in m \ ® & W "(i, m ) : 

K w n = n' n™ 5 (n^a ® n^^) ^ ® a wn (n /5io „ ) a <g> n CTm „ i(S ) n™ 5 n'. 

In the above, ILj^^fg)]!^^ is the conditionally typical projector for p w n (g)ov, II" ^ is an average 
typical projector for r, and II' is another eigenvalue cutoff projector. The states p' n and r /n are 
the expectations of the states p' w n and K w n , respectively, with respect to the pruned version of the 
distribution pw n {'w n ). Recall that the operator Chernoff bound guarantees with high probability 
that the sample averages rgy P^w/ m ) an d \c^m\Y1i m K w n (i,m) are within e (in the operator 
interval sense) of their true expectations p' n and r /n , respectively. Thus there exist particular 
values of the w n (l,m) such that the above conditions are all true. We can use the condition in 
(51) to guarantee that the following defines a legitimate POVM (just as in Winter's approach in 
Section |2.3|) : 

_S 1_ 



~,(m) -1/2 -/ -1/2 



where S is the mass of the typical set corresponding to the distribution pw n (w n ) and oj = p® n . 
Also, observe that the following states are close in trace distance for sufficiently large n, due to 
quantum typicality and the Gentle Operator Lemma: 

\\p' w ™ - Pw"\\i < fi{t), (53) 
- Pw n ® <7w n \\i < /2(e). (54) 

Here and in what follows, /j(e) is some polynomial in e so that lim^o fi(e) = 0. 

We use the conditions in ( 51 ) and ( 52 ) to guarantee that the simulation is faithful. The protocol 
proceeds as follows: Alice and Bob use the common randomness M to select a POVM Y( m ). Alice 
performs a measurement and gets the outcome I (corresponding to the operator Yj ) . She sends the 
index I to Bob, who then prepares the classical state (T w nn m \ based on the common randomness m 
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and the measurement outcome I. Consider the following chain of equalities: 

Tr^n { (id ® K« ) « n ) } <8> \x n ){x n \ 
= Tv A 4(id®M wn )(<p® n )}®p Xnlw 4x n \w n )\x n )(x n \ 

W 71 ,'X 71 



E^Tr„{(id^T^)(^)} 



1 x - S „ 



m,l 



By exploiting (52), that ||r® n - r /n ||i < / 3 (e), and that \\p® n - p /n ||i < / 4 (e), we have that 



— r 



|A<||£| 



K w n (l,m) 



Also, we have that 



1 \ ~» S ^/ 1 \ ^ 



m,l 



From (53) and (54) and convexity of trace distance, we have that 



\M\\£\^ KwHl > m) |.M||£|^^ n ^ 

ml ml 



m) SSI a w n (l,m) 



i < / 7 (e). 



Putting all of these together with the triangle inequality gives an upper bound on the trace distance 
between the ideal output of the measurement and the state resulting from the simulation: 



Tr A n { (id ® ) } <g> K) - J] TTT^ { (id ® T, (m) ) (cf>f n ) } ® a w „ (i , m) 

a;" ml ' ' 



i < fs(e). 



The case for a general quantum instrument follows by similar reasoning as that in the proof of 
Theorem [H 



3.3 Converse for non-feedback measurement compression 

We now prove the converse part of Theorem |9| Our proof is similar to Cuff's converse proof for 
the non-feedback version of the classical reverse Shannon theorem [T7] . Figure [2] can serve as a 
depiction of the most general protocol for a non-feedback simulation, if we ignore the decoding 
on Alice's side to produce X' n . The non- feedback protocol begins with Alice and the reference 
sharing many copies of the state 0^ and Alice sharing common randomness M with Bob. She 
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then chooses a quantum instrument Y( m ) based on the common randomness M and performs it on 
her systems A n . The measurement returns outcome L, and the overall state is as follows: 



2 |A^| (T 



(« A ) C 



® \m){m\ , 



where is a completely positive, trace non-increasing map. Alice sends the register L to Bob. 
Based on L and M, he performs some stochastic map Vx n \L M (x n \l, m) to give his estimate x n of 
the measurement outcome. The resulting state is as follows: 



R n A n LMX n 



= £ ^T^iL,M(^i^)( T ! m V n (« A ) 0n ) ® \i)(i\ L ® k)(mr ® rx^T"- 



Lrn.x 



The following condition should hold for all e > and sufficiently large n for a faithful non-feedback 
simulation: 



R 



n\ /„n|X™ 



i < e 



We prove the first bound as follows: 



nR > H(L) U 

> I(L- MiT% 

= I(LM- iT% + I(L; M) u - I(M; R 11 )^ 

> I(LM- R n ) u 

= H{R n ) u - H(R n \LM) LU 

>Y^[H(R k ) u -H(R k \LM) u ] 

k 

= £/(LM;i2 fc ) w 

k 

= nI(LM; R\K) a 

> nI(LM; R\K) a + nI(R; K) a - ne 
= nI(LMK; R) a — ne . 

The first two inequalities follow for reasons similar to the first few steps of our previous converse. 
The first equality is an identity for quantum mutual information. The third inequality follows 
because there are no correlations between R n and M so that I(M; R n ) u = 0. The second equality 
is an identity for quantum mutual information. The fourth inequality follows from subadditivity of 
quantum entropy: 

H(R n \LM) u < Y,H(Rk\LM) u , 

k 

and because the state on R n is a tensor-power state so that 

H(R n ) U} = Y,H(R k )u,. 



37 



The third equality is another identity. The fourth equality comes about by defining the state a as 
follows: 



a 



RALMXK 



Lm,k,x 



\l)(l\ L ® \m)(m\ M ® \x)(x\ x <g> \k)(k\ K , (55) 



\K 



and exploiting the fact that K is a uniform classical random variable, with distribution 1/n, de- 
termining which systems R k A k X k to select. (The notation Tr R j with % < j indicates to trace over 

systems Ri - ■ ■ Rj.) From the fact that the measurement simulation is faithful, we can apply the 
Alicki-Fannes' inequality to conclude that 



I[RX;K) a 



I[RX-K) a -I(RX;K) T 



<e', 



(56) 



where r is a state like a but resulting from the tensor-power state for ideal measurement com- 
pression (and due to its IID structure, it has no correlations with any particular system k so that 
I(RX; K) T = 0). The above also implies that 

I(R;K) a < e', 

by quantum data processing. The final equality is an application of the chain rule for quantum 
mutual information. The state a for the final information term has the form in ( 50 ) with LMK = 
W, the distribution Px\w( x \ w ) as 

Px k \L,M^ J 

and the completely positive, trace non-increasing maps Ai w defined by 



1 



7 Tr .fe_i 



n\M \ ~ A i A k+i 



{< 



T 



(m),A n ((±A\®k-l 



Also, observe that R — {LMK) — X forms a quantum Markov chain. 
We now prove the second bound: 

n(R + S)> H(LM)u 

> I(LM; X n R n ) u 

= H{X n R n ) tJ j - H(X n R n \LM)u 

> ^ \H(X k R k ) u - H(X k R k \LM) u 



ne 



Y,I(LM;X k R k ) u 



ne 



= nI{LM;XR\K) a - ne 

> nI(LM ; XR\K) a + nI(K; XR) a - n2e' 

= nI(LMK; XR) a - nle . 
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The first two inequalities follow from similar reasons as our previous inequalities. The first equality 
is an identity. The third inequality follows from subadditivity of entropy: 



H(X n R n \LM) UJ < J2H(X k R k \LM) a 



and from the fact that the measurement simulation is faithful so that 



H{X n R n ) w -Y,H(X k R k \ 



< ne , 



where we have applied Lemma 10 below. The second equality is an identity. The third equality 
follows by considering the state a as defined in (55). The fourth inequality follows from (56). The 
final equality is the chain rule for quantum mutual information. Similarly as stated above, the 
state a has the form in (50). 



Lemma 10 Suppose that a state p A " is e-close in trace distance to an IID state (o~ A ) 



A\<gin\ 



< e. 



(57) 



Then the entropy H(A n ) p of p A " and the entropy H(A k ) p are close in the following sense: 



H(A n ) p -Y,H{A k ) p 



< 2nelog|A| + (n + l)H 2 (e). 



Proof. Apply the Fannes-Audenart inequality to (57) to obtain 

\H{A n ) p - H(A n ) a ^\ < en\og\A\ + H 2 (e) 



(58) 



The following inequality also follows by applying monotonicity of trace distance to (57): 



P 



a A i<e, 



which then gives that 

\H(A k ) p - H(A) a \ < e\og\A\ + H 2 (e), 
by again applying the Fannes-Audenart inequality. Summing these over all k then gives that 

n 

YJ\ H ( A )° - H{A k ) p \ < nelog\A\ + nH 2 {e). (59) 

k=l 



Applying the triangle inequality to (58) and (59) gives the desired result: 

n 

2nelog\A\ + (n + l)fT 2 (e) > \H(A n ) p - H{A n )^ n \ + Y) H ( A )* ~ H ( A k) P \ 



> 



k=l 
n 



H{A n ) p - H{A n ) a ^ + Y^[H(A) a - H{A k ) p ] 

k=l 

n 

H(A n ) p - H(A n )^ n + H(A n ) a ®n - H{A k ) 

n 

H(A n ) p -Y,H(A k ) p 



k=l 



k=l 
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4 Classical data compression with quantum side information 



We now turn to the third protocol of this paper: classical data compression with quantum side 
information. We discuss this protocol in detail because it is a step along the way to constructing our 
protocol for measurement compression with quantum side information (and we have a particular 
way that we construct this latter protocol). Devetak and Winter first proved achievability and 
optimality of a protocol for this task [27J. They proved this result by appealing to Winter's proof 
of the classical capacity theorem [63] and a standard recursive code construction argument of Csiszar 
and Korner [TB]. Renes et al. later gave a proof of this protocol by exploiting two- universal hash 
functions and a square-root measurement |51l 154"] (the first paper proved the IID version and the 
latter the "one-shot" case). Renes et al. further explored a connection between this protocol and 
privacy amplification by considering entropic uncertainty relations |50} I53j. 

Our development here contains a review of this information processing task and the statement of 
the theorem, in addition to providing novel proofs of both the achievability part and the converse 
that are direct quantum generalizations of the well-known approaches in Refs. |15| [30] for the 
Slepian-Wolf problem [58]. The encoder in our achievability proof bears some similarities with 
those in Refs. [151 EQl E3] — the protocol has the sender first hash the received sequence and send the 
hash along to the receiver. The receiver then employs a sequential quantum decoder — he searches 
sequentially among all the possible quantum states that are consistent with the hash in order to 
determine the sequence emitted by the source. The main tool employed in the error analysis is 
Sen's non-commutative union bound [56J. A potential advantage of a sequential decoding approach 
is that it might lead to physical implementations of these protocols for small block sizes, along the 
lines discussed in Ref. [6T] . 

4.1 Information processing task for CDC with QSI 

We now discuss the general information processing task. Consider an ensemble {px(x), p x }- Sup- 
pose that a source issues a random sequence X n to Alice, distributed according to the IID distribution 
Px n {x n ), while also issuing the correlated quantum state px n to Bob. Their joint state is described 
by the ensemble {px n ( xTl ), Px n }, or equivalently, by a classical-quantum state of the following form: 

Y,px^x n )\x n ){x n \ xn ® P B X :. 

x n 

The goal is for Alice to communicate the particular sequence x n that the source issues, by using as 
few bits as possible. 

One potential strategy is to exploit Shannon compression — just compress the sequence to 
nH(X) bits, keeping only the typical set according to the distribution px n {x n ). But they can 
actually do much better in general if Bob exploits his quantum side information in the form of the 
correlated state p x n. 

The most general protocol has Alice hash her sequence x n to some variable L G C (this is just 
some many-to-one mapping / : X n — > C). She transmits the variable L to Bob over a noiseless 
classical channel using log 2 |£| bits. Bob then exploits the hashed variable L and his quantum 
side information p x n to distinguish between all of the possible states that are consistent with the 
hash L (i.e., his action will be some quantum measurement depending on the hash L). The output 
of his measurement is some approximation sequence X n . The protocol has one parameter that 
characterizes its quality. We demand that the state (j xnxnBn after Alice and Bob's actions should 
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Figure 4: Ideal protocol for classical data compression with quantum side information. In this protocol, 
we assume that a quantum information source distributes many copies of a classical-quantum state to Alice and Bob, 
such that Alice receives the classical part and Bob receives the quantum part. The goal is for Alice to communicate 
the classical sequence received from the source to Bob. In an ideal case, she would simply transmit this sequence 
to Bob. Though, it is possible to obtain a significant savings in communication by allowing for an asymptotically 
vanishing error and for Bob to infer something about the classical sequence from his correlated quantum states. 



be close in trace distance to an ideal state p x " x B " , where X is a copy of X n (this would be the 
ideal output if Alice were to just send a copy of the variable X n to Bob): 



X n X B n _ (7 X n X n B r ' 



< e. (60) 



The above specifies an (n, R, e) code for this task, where R = \og 2 \C\/n. 

A rate R is achievable if there exists an (n, R, e) code for all e > and sufficiently large n. 

4.2 Classical data compression with QSI theorem 

Theorem 11 (Classical data compression with quantum side information) Suppose that 

Y,Px{x)\x)(x\ x 

X 

is a classical- quantum state that characterizes a classical- quantum source. Then the conditional von 
Neumann entropy H{X\B) is the smallest possible achievable rate for classical data compression 
with quantum side information for this source: 

inf{i? | R is achievable} = H(X\B). 

4.3 Achievability proof for CDC with QSI 

The resource inequality for this communication task is as follows: 



XB \ + H(X\B)[c^c] > ( P XXbB ), 
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Figure 5: Classical data compression with QSI protocol. The protocol begins with the source distributing a 
random classical sequence to Alice and a correlated quantum state to Bob. Alice begins by hashing the sequence to a 
variable L with some hash function /. She then transmits the variable L to Bob, using log 2 |£| noiseless classical bit 
channels. Bob receives the hash, and he then enumerates all of the sequences x n that are consistent with the hash 
(so that f(x n ) — I). He performs a "quantum scan" over all of the quantum states p x m that are consistent with the 
received hash. This quantum scan amounts to a sequence of binary quantum measurements, effectively asking, "Does 
my quantum state correspond to the first sequence consistent with the hash? To the second? etc." After receiving the 
answer "yes" to one of these questions, he declares the "yes" sequence to be the one sent from Alice. This strategy 
has asymptotically vanishing error as long as the size of the hash is at least nH(X\B) bits. 



the meaning being that if Alice and Bob share many copies of the state p and she communicates 
at a rate H(X\B) to Bob, then they can construct the state p xx s B ; so that Bob has a copy of the 
variable X. 

The strategy for achievability is for Alice to hash her sequence X n to some variable L. Bob then 
receives the variable L after Alice communicates it to him. He then "scans" over all of the quantum 
states p x n that are consistent with the hash and such that the sequence x n is typical (the strategy 
essentially disregards the atypical sequences x n since their total probability mass is asymptotically 
negligible). He can accomplish this "scan" by performing a sequential decoding strategy [32} [55]. 
which consists of binary tests of the form, "Is this state consistent with the hash? Or this one? 
etc." He performs these tests until he receives a "yes" answer in one of his measurements. 

The intuition for why H{X\B) should be the ultimate rate of communication is that there are 
~ 2 nH ( x ^ sequences of the source to account for (the typical ones). From the HSW theorem [35^55], 
we know that the maximal number of sequences that Bob can distinguish is « 2 nI( - X ' B h Thus, if 
Alice divides the source sequences into ~ 2 nH ( x > /2 nI ( X ' B ' = 2 nH ( x \ B ^ groups and sends the label 
of the group, then Bob should be able to determine which sequence x n is the one that the source 
issued. 

Detailed Strategy. More formally, the encoding strategy is as follows. Alice and Bob are 
allowed to have an agreed-upon hash function / : X n — > £, selected at random from a two-universal 
family. A hash function / has a collision if two differing sequences x",X2 £ X n hash to the same 
value: 

xl + A = /(*£). 
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A two-universal family has the property that the probability of a collision is the same as that for a 
uniformly random function (where the probability is with respect to the random choice of the hash 
function) : 

z?^z5 => Pr{m) = f(x%)}<± = 2- nR (61) 

Such a strategy is equivalent to the "random binning" strategy often discussed in information 
theory texts [151 EOT] . 

Upon receiving the hash value I, Bob performs a sequence of binary measurements {H x n , I — n x n } 
for all the sequences x n that are consistent with the hash value (so that f(x n ) = I) and such that 
they are strongly typical (x n £ T^" — see Appendix [X| for details). We define the set A(f,l) to 
capture these sequences: 

A(f,l) = {x n :f(x n ) = l, x n eT s xn }- (62) 

The projector H x n is a strong conditionally typical projector (see Appendix [A|) , with the property 
that 

Tr{U x n p x n} > 1 - e , 

for all e > and sufficiently large n. From the above property, we would expect these measurements 
to perform well in identifying the actual state transmitted. 

Error Analysis. We define the error probability as follows: 

Pr{ "error @ decoder"} = E^x™^™) Pr{ "error @ decoder" | x n }. 

x n 

It is then clear that we can focus on the typical sequences x n because the above error probability 
is equal to 

E Px n (% n ) Pr{ "error @ decoder" | x n } + E Px™(x n ) Pr{ "error @ decoder" | x n } 

x«erf n x»£Tf n 

< E Px n (x n ) Pr{ "error @ decoder" | x n } + e. (63) 



X 



»6Tf 



Now we consider the error term Pr{ "error @ decoder" | x 11 }. Let , . . . , af^i enumerate all 



of the sequences in the set A(f, I) defined in (62) (those sequences consistent with the hash I). Let 
cim be the actual seq 
for Bob is as follows: 



dm be the actual sequence x n that the source issues. Then the probability for a correct decoding 



Tt \ n „ (0 n „ (!) • • • Urn pm Urn ■ ■ ■ Urn LT a 



CO fj 



"m-l 



where ti x n = I — U x n , so that the binary tests give a response of "no" until the test for am gives a 
response of "yes." The probability for incorrectly decoding with this strategy is 

i - Tr<^ n (!) n y) • • • n (!) p m n m ■ ■ ■ n (i) n (J) 

W m u m — 1 1 m 1 ?n — 1 m 



so that we can write the error probability in ( 63 ) as 



E E P* n ( a ™) I 1 ~ n «w ^ ■ ■ ■ n a « p a (o n a (!) • • • n a ( n (0 ) . 

' * ■ \ / a m a m — 1 1 m 1 m — 1 771 I 

ir-ra_\ l ^ J J 



(64) 
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We can rewrite this error probability as 



E E ^(^) Tr {( 7 - ^)^}' 



where we define the POVM element (j) as 

0.(0 = n to •• • n (o n (i) n y) • ••n (0 . 

"m «i "m-l a m "m-l "l 

Using the facts that (see Appendix |A|) 

1 = H p *st} = H 11 ^} + Tr { (/ - n) ^} * Tr { n ^} + e ' 

where II is the typical projector for the average state ^2 x Px(x)p x , and 



Tri n m n 



Co n (,) • • • n (o P m n (o • • • n to n (0 

m m— 1 1 m 1 m — 1 m 



Tv<!n (0 ---n (0 n to n (0 

a l "m-l "™ "m-l 



>Tr<!n (0 ---n (o n to n to 

"l "m-l °m "m-l 



>Tr<!n ( ---n (o nm Em 

"l "m-l °m "m-l 



•n (o n p (o n 



we can bound the expression in (64) from above by 



(65) 



E E P* n (°-) ^1 11 "I - Tr i n a« V> ' ' ' A a« H H ^ ' ' ' ft a« H a« 

* ■ ' ■* V / L J °-m a m — 1 1 m 1 m — 1 m 



(66) 

(with the other terms e + 2-^/e omitted for simplicity). We now apply Sen's non-commutative union 



bound |56j (Lemma 18 in Appendix [BJ) along with concavity of square-root to obtain the following 
upper bound: 



. E E px* 



,(') 



m— 1 



1=1 



For the first term in the square-root, we have that 

Tr {( J - n ^) n ^ n } ^ Tr {( J - n ^)^} + 



Pji) ~ n Pj') n 



(67) 
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For the second term in the square- root, we have 

TO— 1 



E E ^(^)E^K(onp o «n} 



^E E E ^n^n} 



= E vx^x n ) E x(/(x'") = /(^))Tr{n^n P ^n} 
< E^"(^ n ) E ^(/(* ,n ) = /(* n )) Tr{n^ n Pxn n}. 

a ,n x ,n (zTfi^ n ■ x tn ^x n 

The first inequality follows by summing over all the indices not equal to m. The equality follows by 
introducing the indicator function I(f(x' n ) = f(x n )), and the last inequality follows by summing 
over all sequences x n . 

We now analyze the expectation of the error probability, with respect to the random hash 
function /. (We can imagine that this expectation was there from the beginning of the analysis, 
and apply concavity of square-root to bring it over this second term) . This leads to 

E /| E^"^ E x (f( x ' n ) = /(*")) ^{ n ^ n p*» n i 

[ * n x' n eT^ n : x' n ^a 

= E^"( X ") E E f{ x (f( x ' n ) = /(* n ))} ^{ n -'" n p*» n} 

x" x' n eT^ n : x' n ^x n 

= E>x»(* n ) E p /{fi x ' n ) = /(*")} Tr { n ^ n n i 



x' n eTi 



where the inequality follows from the two-universal property in (61), the fact that R = log 2 \C\/n, 
and by summing over all sequences x' n in the typical set. Continuing, we have 



2 -nR £ Trillin [Y,PX^X n ) Px n\ III 

2 ~nR Tr{n x ,n n P ® n n} 



x m eT X" 



< 2 -nii 2 - n [H(B)-i] ^ Tr{n x m n} 



< 2 -nil 2-»^H £ Tr{n^} 

< 2 -nB 2 _n [- f/ ( B )~ <5 l 2 n [ H (- B l x )+ 5 ] 2 ri [ H ( x ) +<5 ] 
_ 2 -n[fl--ff(X|B)-35] 
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The first inequality follows from the operator inequality IT p® n IT < 2 n l H ( B ) *] n, and the second 
from II < I. The final inequality follows from the bounds Tr{n x /n} < 2 n ^ H( - B ^ +s ^ and \T^ n \ < 
2n[H(x)+6] _ Collecting everything, the overall error probability is bounded by 



e > = e + + 2\/e + 2^i + 2-™[*-tf(*l B )- 3 < 5 ] . 



(68) 



Since the expectation of the above error probability is small (where the expectation is with 
respect to the random choice of hash function) , there exists some particular hash function from the 
family such that the above inequality is true. Thus, as long as R = H(X\B) +45, we can guarantee 
that the error probability of the scheme is arbitrarily small. 

We now argue that it is possible to make the state after the decoding be arbitrarily close to the 
initial state, so that the condition in |6o| holds. After recovering the sequence x n = a|' issued by 
the source, Bob can place it in a classical register, and the post-measurement state from sequential 
decoding has the following form: 



f T n a W n a (,) ' ' ' H a (i) P a (l) U a W 

Trie (l)P (!) \ m m ~ l 1 m 1 



n 



(!) 

m— 1 



n 



(0: 



with (i) defined in (65) and assuming a correct decoding. The operators IT (!) IT 



(0 



(() are related by a left polar decomposition: 



• IT (!) and 



n m rim •••n« 



CO 11 (0 

fl CI 



U a W \/ a (i) ' 



for some unitary U (!) . So after Bob recovers the sequence a™ , he applies the unitary U\ l} , and 
the state becomes as follows: 



Tr { a W />a (i) } 



\/ a (i) P a (l) \/ & a W - 



We can now show that the condition in (60) holds for this decoding procedure (including the 
unitaries U*,^). Consider that for all typical sequences x n G T* n , the trace distance between the 
initial state and the post-measurement state has the following bound: 



c n — Tr{@ x np x n} 1 \J @ x n p x n \J Q x n I < 2^Tt{(I — @ x n )Px n }, 

which follows from the Gentle Measurement Lemma (Lemma 9.4.1 of Ref. [60]). Combining this 
bound with the fact that there is no measurement when x n ^ TP and defining O x n = I in this 
case, averaging over px"(x n ), and then applying our bound in (68) and concavity of square-root, 
we obtain the following upper bound: 

^2pX n {x n ) Px" ~ Tr{@ x np x n}~ 1 y/Q x n p x n y/@ x n 



< 2Ve 



It then follows that the condition in (60) is satisfied for this protocol. 
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4.4 Converse theorem for CDC with QSI 



This section provides a simple proof of the converse theorem for CQC-QSI. The converse demon- 



strates that the single- letter rate in Theorem 11 is optimal. An inspection of the proof reveals a 
close similarity with the Slepian-Wolf converse in Ref. |30j . 

In the most general protocol for this task, Alice receives the sequence X n from the source. She 
then hashes it to a random variable L where f(X n ) = L and sends it over to Bob via some noiseless 
classical bit channels. Bob receives L, processes it and B n to obtain an estimate X n of X n . If the 
protocol is any good for this task, then the actual state (jj xnxnBn a t the end should be e-close in 
trace distance to the ideal state a x " x where X is a copy of the variable X n : 



X n X n B n 



a 



X n X B n 



< e. 



(69) 



A proof of the converse goes as follows: 



nR > H{L) 

> H(L\B n ) 

= I(X n ; L\B n ) + H(L\B n X n ) 

> I(X n ;L\B n ) 

= H(X n \B n ) — H(X n \LB n ) 

> H(X n \B n ) UJ - H(X n \X n ) u 

> H(X n \B n ) a -ne' 
= nH{X\B) - ne. 

The first two inequalities follow for reasons similar to those in the previous converse in Section [2.4| 
The first equality is an entropy identity, and the third inequality follows because H(L\B n X n ) > 
for a classical variable L. The second equality is another entropy identity. The fourth inequality 
follows from quantum data processing of L and B n to obtain the estimate X n . The final inequality 



follows from the condition in (69), continuity of entropy (Alicki-Fannes' inequality (3]), and the 
fact that H(X n \X ) cr = since X is copy of X n , with e' being some function g(e) such that 
lim e _3.og(e) = 0. The final equality follows because the entropy is additive on a tensor-power state. 



5 Measurement compression with quantum side information 

We now discuss another new protocol: measurement compression with quantum side information 
(MC-QSI). The information processing task for this protocol is similar to that in measurement 
compression (Section [2]), with the exception that they are to perform the protocol on the A system 
of some bipartite state, and Bob is allowed to use his system B in order to reduce the communi- 
cation resources needed for the simulation. The protocol discussed in this section is a "feedback" 
simulation, in which the sender also obtains the outcome of the measurement simulation. After 
reviewing the information processing task, we state the MC-QSI with feedback theorem and prove 
achievability of the protocol and its converse. Finally, we discuss several applications of it. 
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Figure 6: Ideal measurement compression with quantum side information. The ideal protocol to which 
we should compare performance of any actual protocol. The sender and receiver share many copies of some bipartite 
state p AB . Alice performs the measurement A = {A x } locally on each of her shares and sends the outcomes to Bob. 
A simulation of this measurement would have the sender and receiver operate according to some procedure that is 
statistically indistinguishable from this ideal case. 



5.1 Information processing task for MC-QSI 

The information processing task in this case is a straightforward extension of that for measurement 
compression with feedback. As such, we leave the discussion of it to the captions of Figures [6] and [7j 
One point to observe from the figures is that in the ideal implementation of the measurement, the 
side information in system B is left untouched. As a result, the measurement compression protocol 
will be permitted to use system B, but only in ways that do not significantly disturb it. 

5.2 Measurement compression with quantum side information theorem 

Theorem 12 (Measurement compression with QSI) Let p AB be a source state shared be- 
tween a sender A and a receiver B, and let A be a POVM to simulate on the A system of this 
state. A protocol for faithful feedback simulation of the POVM with classical communication rate 
R and common randomness rate S exists if and only if the following inequalities hold: 

R > I(X;R\B), 
R + S> H(X\B), 

where the entropies are with respect to a state of the following form: 

^2\x)(x\ x ®Tr A {{l R ® A A )<p RAB }, 

X 

and cj) RAB is some purification of the state p AB . 
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Figure 7: Measurement compression with quantum side information protocol. The figure depicts the 
most general protocol for this task when both the sender and receiver are to obtain the outcome of the measurement 
simulation. Assuming that Alice and Bob share many copies of some bipartite state p AB and have common randomness 
M available, Alice simulates the measurement A® n by performing some POVM conditional on the value of the common 
randomness. Rather than send the full output L of the measurement to Bob, Alice hashes it to f(L) using some hash 
function /, and she sends the hash f(L) to Bob. Bob performs a measurement on his systems B", conditional on the 
hash f(L) and his share of the common randomness M. From this measurement, he can recover the full value of L 
and then reconstruct the sequence x n using L and M. The protocol is also a "feedback" simulation, such that Alice 
recovers the outcome of the simulation by processing the classical registers L and M. The protocol performs well if 
the output of this simulated measurement is statistically indistinguishable from the output of the true measurement 
A®" (from the perspective of someone holding the reference systems and the measurement outcomes). 
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The achievable rate region closely resembles Figure [3j except that all of the information quan- 
tities should be conditioned on the system B since, in the new task, B is available as quantum side 
information. 



It is instructive to see how the second example of Section 2.2.1 changes if quantum side infor- 
mation is available. Suppose that Bob now possesses the purification of the maximally mixed state, 
so that Alice and Bob share a Bell state before communication begins. This means that there is 
no purification system R because the state on A and B is already pure. In this case, the state on 



X and B after the measurement in (15) is as follows: 

J (|0)(0| x ® |0)(0| B + |l)(l| x ® \l){l\ B + \2){2\ x 8) \+){+\ B + |3)(3| x ® \-){-\ B ). 

The conditional mutual information I(X; R\B) is zero because the reference system is trivial. The 
conditional entropy H(X\RB) = H(X\B) is equal to one bit. A simple interpretation of this result 



is that the measurement in (15) just requires one bit of common randomness in order to pick the 
X or Z Pauli measurement at random. Bob then performs the selected measurement locally, and 
the effect is the same as if Alice were to perform it on her share of the state because the state is 
maximally entangled. 

5.3 Achievability Proof for MC-QSI 

The resource inequality corresponding to MC-QSI is as follows: 

(p AB ) + I(X;R\B)[c^c]+H(X\RB)[cc] > (A A (p AB )). 

The meaning of this resource inequality is that the sender and receiver can simulate the action of the 
POVM A® n on n copies of the state p AB , by exploiting nI(X; R\B) bits of classical communication 
and nH{X\RB) bits of common randomness, and the simulation becomes exact as n becomes large. 

One might think that it would be possible to concatenate the protocols of measurement com- 
pression and CDC-QSI according to the rules of the resource calculus [22] in order to have a protocol 
for MC-QSI. The scheme that we develop below certainly does exploit features of both protocols, 
but a direct concatenation is not possible because Alice and Bob need to exploit the same codebook 
for both the measurement compression part and the CDC-QSI part of the protocol. We note that 
this is similar to the way that the protocol for channel simulation with quantum side information 
operates [44j . 

The basic strategy for MC-QSI is as follows. Alice simulates the measurement on the A n systems 
of the IID state (p AB )® n , with the systems R n B n acting as a purification of A n , by first selecting the 
variable m according to the common randomness shared with Bob, and then by performing a POVM 
{T, } chosen according to a codebook C = {x n (l,m)}. (The codebook is of the form discussed 



in Section 2.3 ) Alice and Bob both know the codebook C used in the measurement compression 
strategy. Bob shares the common randomness variable m with Alice, and thus he already has this 
as side information to help in determining the variable I. Alice hashes the variable I according to 
some hash function / and sends the hash. Bob receives the hash k = f(l), and then he "scans" 
over all of the post-measurement states (corresponding to codewords x n (l',m)) that are consistent 
with the hash k and his common randomness value m. We define the set A(f, k, m) to denote the 
set of all such codewords: 

A(f, k, m) = {x n (l, m) : f(l) = k, x n {l, m) G C}. (70) 



50 



Observe that this set cannot be any larger than C (the set of all possible I): 

\A(f,k,m)\ < \jC\ = 2 ^ i (X;RB)+3S]_ 

The intuition behind the protocol is that measurement compression proceeds as before using 
nI(X; BR) bits for the outcome of the measurement and nH(X\RB) for the common randomness, 
because the systems RB act as a purification for A. But in this case, Bob has the quantum systems 
B n available and should be able to determine nI(X;B) bits about X n by performing a collective 
measurement on his systems (following from the HSW theorem [35, 55J). So, Alice should only 
need to send the difference of these amounts, n(J(X; BR) — I(X; B)) = nI(X; R\B), to Bob. 

Detailed Strategy. The encoding strategy for this scenario is as follows. Alice and Bob are 
allowed to have an agreed-upon hash function / : £ — > 1C, selected at random from a two- universal 



family (as described in Section 4.3). Alice's message to Bob will be an element of /C. The collision 



probability for some I 7^ I' in this case is as follows: 

p/{/(o = /(0}<^ = 2- R 

Upon receiving the hash value k and having a particular value m for the common randomness, 
Bob performs a sequence of binary measurements {n^ji^), I — n^^m)} for all the codewords 
x n (l,m) G C that are consistent with the hash value (so that /(/) = k). Note that il in (j m ) is a 
conditionally typical projector for the tensor-product state p^nn m ) > the conditional state on Bob's 
system after performing the ideal measurement A® n and receiving the outcome x n (l,m). Recall 



from (70) that A(f, k,m) is the set of all such codewords. In the following, we will show that, by 
choosing 

|£| = 2 n[I(X;RB)+3S]^ ^ 
_ 2 n[H(X\RB)+8]^ /^n 
|^| =2 «[/(X;fl|B)+ll«] j (73) 

the error probability will approach zero as n goes to infinity. 

Error Analysis. The error probability for this decoder is then as follows: 

Pr{ "error @ decoder"} = — — ■ ^ q(x n (l, m)) Pr{ "error @ decoder" | l,m}, (74) 

l,m 

where 

q(x n (l, m)) = Tr{ ((T^V" 9 1^) (p AB ) m } 

is the probability of receiving outcome I when performing the simulated measurement in ( 33 ) . The 
post-measurement states on B n for the POVM {y|"^} are as follows: 



~B n - 1 / (T (m)\A n ( n AB\®n\ 



Note that the probability masses q(x n (l, m)) and q(x n (l', m')) and the states p^nn m ) and p^nni m ') 
are equivalent, respectively, if two different codewords have the same value (i.e., if / 7^ or m 7^ m! 
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but x n (l,m) = x n (r,m'), then q(x n (l,m)) = q(x n (l',m')) and p B n^ m ) = P^(i\ m ') 



this is due to 



the way that we choose the measurement operators "cj m ^ in ( 33 ) for the measurement simulation 



Now we consider the error term Pr{ "error @ decoder" | l,m}. Let a^ m \ . . . , o^jT 1 enumerate 



(km) 



\A\ 



all of the codewords x n (l, m) in the set A(f, k, m) defined in (70) (those codewords x n (l, m) consis- 
tent with the hash k). Let a~ km ^ denote the actual codeword x n (l,m) produced by the simulated 
measurement. The probability for a correct decoding for Bob is as follows: 



Tr<j n^fem) n rt ( fem ) • • • U a ( km ) P a (km) H n (fcm) • • • n^fcm) n n ( fcm ) }>, 



so that the binary tests give a response of "no" until the test for a~ km ^ gives a response of "yes." 
Then the probability for incorrectly decoding is 



1 — Tr^ II (km) IL (km) '"H (km) p B (k m ) U • • • I I I , ;. ,„ > 

dj j — 1 1 a j 1 1 



n (km) n 

a • i a 



so that we can write the error probability in (74) as follows (for this decoding strategy): 



E E |^ g l", 



(km) 



m ^af m ^A(k,f,m) 



i - Tr<^ n {ftm) n 



,' mi) 11 (fcm) • • • 11 (fern) P (km) 6 " * * A (fcm) II (fe m ) 

a j a i a] a i 



Observe that the above error probability is equal to 



/,/)! 



(75) 
(76) 



if we define the POVM element 



x n (l,m) 



as 



®a;»(j,m) = n (fcm) • • • n (km) n (Am) n (fcm) • • • n ( fem) , 

a l "j-l "j 3-1 1 

where we recall that a^ km ^ = x n (l,m). 

Now, we can further express the error probability in (76) as follows, by employing an indicator 
function: 

E \m E m)) ^ = m)) H ( J - e ^)) 



i™e^" ' ' l,m 



(77) 



where we define Q' x n as a POVM element corresponding to a worst-case decoding over the states 
P B ^n ^ with the same codeword value x n : 

1 x n [Lm) 



6' n = arg max Tr< (I — ©„ 



®x n (l,m) '■ 
x n =x n (l,m) 



B" 



i x n (l,m)) Px n (l,m) 
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Let C be a pruning of the original codebook C containing no duplicate entries (it contains only 
the codewords with worst-case error probabilities as given above). Then the last line in (77) is 
equivalent to the following one: 



E Trl (J-6U) |^£^Z,m))X(^ = ^(Z,m)) p^ m) 

x n eC 1 ' ' l,m 



(78) 



This decoding scheme will only work well if the states p x nn m ) are close to the tensor product 
states p^nn m \ that would result from the ideal measurement. We expect that this should hold if 
the measurement compression part of the protocol is successful, and we prove this in detail in what 
follows. 



The quantity characterizing a faithful measurement simulation in (11) is equivalent to the 



following (one can show this by exploiting the definitions of the measurement maps and the post- 
measurement states given above, after tracing out the reference systems): 



m = E 



/ 71 \ R 71 

PX"{X )P x n 



\M\ 



Lm 



E 11^(^)^*111+ E 

x n 4c x n ec 



Lm 



where C is the pruned codebook containing no duplicate entries (observe that the indicator function 
I(x n = x n (l, m)) captures all of the duplicates). Applying the trace inequality TrjAer} < TrjA/o} + 



— cr [| i from Lemma 17 to the expression in (|78j), we then obtain the following upper bound on 

it: 



< E Tr{(l-e' xn )p xn (x n )p*:} 



x n ec 



E 



x n (l,m) 



PX n(x n )p*: - A_]T g (x"(/,m)) Z{x n = x n {l,m)) p\ 

< E Px«(x n )Tr{(/-e;„) pf„"}+A(C). 



We can now focus on bounding the term on the LHS of the last line above. Expanding it again 
leads to 



E Px<x n )Tr{(l - & x n) P B X :} 



E E Px^af^)Tr\(l-e^ km) )pJ k J 



E E p xn ( a i 



meM, J km ) c a i 



(km) 



fce/C i 



a^" l >£A'{kJ,m) 



i - Tr<! n ( km ) n (km) ■ ■ • n ( fcm ) p B (km) n < km ) ■ ■ • n (fcm) n (fcm) 



a] 'a 
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where 



A'{f,k,m) = {x n {l,m) : /(/) = k, x n (l,m) G C'}. 



We can then insert the average state typical projector as we did before in ( [66] ) , in order to bound 
the last line from above as 



E E 



Px 



Tii n p B ( l m) n } - Tr\ n (km) n (fem) • • • n (fcm) n p 6 ^, n n (fcm) 
°j j l j j— i a i a j a i 



■ II (fe m ) II (fe m ) 

J— 1 J 



The error accumulated in doing so is e + as before. At this point, we can apply Sen's non- 



commutative union bound (Lemma 18 in Appendix |B|) and concavity of square root to obtain the 
upper bound 



\ 



E E 



Px n a 



(km) 



Tr| (i - n^n P B j hm) n| + ^Tr|n a(fcm) n P B ; m) n| 



The first term inside the square root we can bound from above by e + lyfi as we did before in (67) 
using properties of quantum typicality. We continue bounding the second term as follows: 



E E 



(km) 



^Tr|n a ( fcm) u P B ; m) nj 



^ E E ^(4 



(fcm) 



E 



a ) 'eA'(k,f,m) 



i& : a? m) £A(kJ,m) 



tw n {km) n p 



(km) 
I ■ ' 
J 



n 



^ w»(^(i,m)) ^ x(/(0 = /(0)Tr{n a! » (I / im) n / ^ (l)m) n}, 

J,m : x n (l,m)eC I'eC : I'^l 



where the two steps follow by including all indices in the sum not equal to j and rewriting the sum 
with indicator functions. We now take an expectation with respect to the random hash (realizing 
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that we could have done this the whole time): 



Y Px ^x n (i,m)) Y ^(/(0 = /(O) Tr { n x"a', m ) n ^, m) n}| 

l,m : x n (l,m)eC I'eC : I'+l J 

E Px-{x n {Um)) Y %{*(/(0 = /(0)}^{^.nO n ^Vo n } 

l,m : x n (l,m)eC I'eC : I'^l 

Y Pxn (x n (i,m)) Y p f r {/(0 = /(O} Tr { n ^a', m) npf; aim) n} 

l,m : x"(l,m)eC I'eC : I'^l 

<2- nR Y Px*(x n (i,m)) Y Tr { n x™a',m) n P *: {hm) n} 

l,m : x n (l,m)eC I'eC : I'^l 

< 2 -nfl J- ^ Tr{n x .„ (r>m) n Px«(^"(Z,m))p^ )m) n} 



J,m «'G£ : I'^l 

<2-^2~^-^Y E Tr{n^ (/ , m) n P f; ( , m) n}. (79) 

l,m I'eC : V+l 

The first inequality follows from the two-universal hashing property. The second inequality follows 
from summing over all of the codewords in C, not just the non-duplicate entries in C . The third 
inequality follows becau se a ll of the sequences in x n (l, m) are chosen to be strongly typical (recall the 



construction in Section 2.3) and by upper bounding their probabilities by 2 n i H ( x ) °l m From here, 



we exploit the fact that the codewords x n (l,m) were chosen randomly as specified in Section 2.3 
So we now consider X n (l,m) as random variables and take the expectation with respect to them 
(realizing again that we could have done this the whole time and focusing on the rightmost term 
above) : 



E *i E E Tr { n ^G',m) 11 PXHl,m) n } 
[ l,m I'eC ■. I'yil 

= E E Tt{E xn {u X n (ll>m) } n E xn { Px l {lim) } n} 

l,m I'eC : V^l 

= E E Tr | E ^{ n x»(i', m )} n Ypx' n ( x ^p* : n [ 

l,m I'eC : I'^l I x" J 

< [i - e] -^-m^Y E Tr{E^{n^ (r , m) }n}. 

l,m I'eC : V^l 

The first equality follows because the indices V and / are different, implying that the random 
variables X n (l',m) and X n (l,m) are independent so that we can distribute the expectation. The 
inequality follows by applying the operator inequality ( 105 ) from Appendix [A] and properties of 
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quantum typicality. Continuing, we have 



< (1 _ e] -l 2 -n [H (B)-Sl J- £ E x .{Tr{n^, m) }} 

l,m I'eC : VfX 

< [1 _ e]- 1 2 -™[^( B )-' 5 ] 2 n[H(B|X)+5] 

Z,m Z'e£ : VfX 

< [1 - e]- 1 2 -™[^Ml 2 n ^ x ^ \C x M||£| 

< [1 _ g]- 1 2 -n[#(-B)-<5] 2 «[-W(B|^)+<5] 2 n[_ff(X)+4<5] 2 «[A^;R£)+3<5] 

The first inequality is from II < / and the second is from the bound Tr-fl L^n (i' tTn ) } < 2 n [^( s l^)+ <5 ]. 
The final inequality follows from the selection for the sizes of C and A4 in ( 7l|(72 ). Combining this 



bound with the one in (79), our final upper bound is 

m _ e j-l 2-n[R-l(X;R\B)-l0S] 

Collecting everything together, we arrive at the following upper bound on the decoding error 



probability in (74): 



e'" = A(C) + e + + + 2^1 + [1 - e]" 1 ^\R-KX\R\B)-m . 

Thus, as along as we choose R = I(X; R\B) + 115, the expectation of this error with respect to the 
hash function and the random choice of code vanishes in the asymptotic limit. 

We now complete our achievability proof by demonstrating that there exists a choice of the 
{X n (l,m)} codewords such that the measurement simulation error and the decoding error become 
arbitrarily small. Let F be the event that the decoding error probability is less than y/ef". Then we 
have the following upper bound on the complement of this event by invoking Markov's inequality: 



Pr { F cj < E C,/{ "decoding error"} < ^ 



/// 



Thus, by choosing \C\, and \JC\ appropriately, we can have all of the events E m , Eq, and 

F be true for some choice of the codebook {x n (l,m)} and the hash / (similar to the "union 



bound" argument in (31)), so that both the measurement simulation error and the decoding error 
probability are arbitrarily small for sufficiently large n. 

After determining the sequence x n resulting from the measurement simulation, Bob can place 
it in a classical register. By using the fact that the measurement simulation and the decoding 
are successful and employing an argument similar to that at the end of Section 4.3, we know that 
the disturbance of the state is asymptotically negligible, so that the condition in (80) for a good 
protocol is satisfied. 

A proof similar to that in Theorem [8] implies that Alice and Bob can exploit quantum side 
information and a protocol similar to the above to simulate a general quantum instrument, in such 
a way that Alice possesses the quantum output and Bob obtains the classical output. The resulting 



resource inequality is stated in (81) below 
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5.4 Converse for measurement compression with QSI 



The converse proof for measurement compression with QSI demonstrates the optimality of the 
protocol from the previous section. Specifically, it shows that the single-letter rates in Theorem 12 
are optimal for the case of a feedback simulation. 

The most general protocol for this task has Alice combine her shares A n of the state with her 
share of the common randomness M and perform some quantum operation with quantum outputs 
A' n and classical output L. Alice then processes this variable L to produce another random variable 
L' , which she sends to Bob over some noiseless classical bit channels. Bob feeds L', his share of 
the common randomness, and his systems B n into some quantum operation with classical outputs 
X n and quantum outputs B' n . If the protocol is any good for this task, then the actual state 

cm A In yln yn jyln i -i -i i i ■ 1 • i • i 1 R n A fj n ~Y ri ^X n Ft n 1 "xF"' . 

oj should be e-ciose m trace distance to the ideal state a , where X is 

a copy of the variable X n : 



R n A' n X' n X"B'"' 



a 



R"A'"X n X B r ' 



i < e. 



(80) 



A proof for the first bound in Theorem 12 goes as follows: 



nR > H(L') 

> l(L';MB n R n ) 

= l(L'MB n ; R n ) + I(L'; MB n ) - I(R n ; MB n ) 

> l(L'MB n ; R n ) - I(R n ; B n ) 

> I(X n B' n ; R' 1 )^ - I{R n - B n ) a 

> I(X n B n ; R n ) a - I(R n ; B n ) a - ne' 
= I(X n ;B n \R n ) a -ne' 

= nI(X;B\R)-ne'. 

The first two inequalities are straightforward (similar to steps in our previous converse proofs) . The 
first equality is an identity for quantum mutual information. The third inequality follows because 
I(L'; MB n ) > and the common randomness M is uncorrelated with systems R n and B n (so that 
I(R n ; MB n ) = I(R n ;B n )). The fourth inequality follows from quantum data processing of the 
systems L'MB n to produce the systems X n B' n . The fifth inequality follows from the condition in 
(80) and continuity of quantum mutual information (the Alicki-Fannes' inequality [3]), where e' is 
some function g(e) such that lime^o 9{ e ) = 0. The second equality follows from the chain rule for 
quantum mutual information: I(X n B n ; R n ) a = I(X n ; R n \B n ) a + I(B n ;R n ) a . The final equality 
follows because conditional quantum mutual information is additive on tensor-power states. 
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The argument justifying the other bound in Theorem 



12 



goes as follows: 



n(R + S) > H(L'M) 

> H(L'M\B n ) 

= l(X' n ; L'M\B n ) + H(L'M\B n X' n ) 

> l(X' n ;L'M\B n ) 

= H(X' n \B n ) - H(X' n \L'MB n ) 

> H(X ,n \B n ) ul - #(x ,n |;r% 

> H(X n \B n ) a -ne' 
= nH(X\B) -ne. 

The first two inequalities are straightforward. The first equality is an identity for quantum mutual 
information. The third inequality follows because the entropy H {V M\B n X' n ) > for classical 
systems L' and M. The second equality is an identity for quantum mutual information. The fourth 
inequality follows from quantum data processing of the systems L'MB n . The last inequality follows 



from the condition in (80), continuity of entropy, and the fact that H(X n \X' b ) a = since X n is a 
copy of X n . The final equality follows because the entropy is additive for tensor-power states. 

Optimality of the bound R + S > H{X\B) for negative S follows by considering a protocol 
whereby Alice uses classical communication alone in order to simulate X n and generate common 
randomness M with Bob. The converse in this case proceeds as follows: 

nR > H(L') 

> H(L'\B n ) 

> l(X /n M;L'\B n ) 

= H(X' n M\B n ) - H(X'M\L'B n ) 

> H(X ln M\B n ) ul - H(X'M\X n M) u 

> H(X n M\B n ) a - ne' 

= nH(X\B) + H{M) — ne' 
= nH(X\B) + n\S\ - ne' . 

The fourth inequality follows because Bob has to process V and B n in order to recover the approx- 
imate X n and M. The fifth inequality follows because these systems should be close to the ideal 
ones for a good protocol (and applying continuity of entropy) . The next equalities follow because 
the information quantities factor as above for the ideal state. 



5.5 Relation of MC-QSI to other protocols 

We remark on the connection between measurement compression with quantum side information 
and two other protocols: channel simulation with quantum side information |44j and state re- 
distribution [29, 68j. MC-QSI lies somewhere in between both of these protocols — it generalizes 
channel simulation with QSI but is not "fully quantum," in contrast to state redistribution, which 
is. Channel simulation with QSI is a protocol whereby a sender and receiver share many copies of 
a classical-quantum state ^2 y PY(y)\y)(y\ Y 8) Py distributed to them by a source, with the sender 
holding the classical systems and the receiver holding the quantum systems. The goal is for the 



5* 



sender and receiver to simulate the action of a noisy classical channel Px\y( x \u) on the sender's clas- 
sical systems by using as few noiseless bit channels and common randomness bits as possible. Luo 
and Devetak found that this is possible by using a classical communication rate of I(X;Y\B) and 
a common randomness rate of H{X\YB) (compare with I(X\R\B) and H(X\RB) for MC-QSI), 
where the entropies are with respect to a state of the following form: 

^PY{y)px\Y{x\v)\y) {v\ Y ® \ x )( x \ x ® Py ■ 

y,x 

(They actually found the rates to be I(Y;X) — I(B;X) and H(X\Y), but combining these rates 
with the fact that I(X; B\Y) = for a state of the above form gives the rates we state above.) This 
protocol exploits aspects of CDC-QSI and the classical reverse Shannon theorem in its proof. It has 
applications to rate distortion theory with quantum side information and in devising a simpler proof 
of the distillable common randomness from quantum states [28]. The completely classical version of 
this protocol has further applications to multi-terminal problems in classical rate distortion theory 
|43j . Clearly, our protocol generalizes channel simulation with QSI because a classical channel, a 
classical-to-classical map, is a special case of a quantum measurement, a quantum-to-classical map. 

State redistribution is a protocol that generalizes MC-QSI to the setting where one would 
like to simulate the action of a noisy quantum channel on some bipartite state p AB . That is, 
state redistribution leads to a quantum reverse Shannon theorem in the presence of quantum side 
information which we call QRST-QSI (the authors of Refs. [29\ 168] did not emphasize this aspect 
of their protocol). Indeed, supposing that the goal is to simulate the action of a channel J\f A ~^ B on 
the bipartite state, they could proceed by Alice locally performing the isometric extension U^f JfB E 
of the channel M A ^ B on the A system of the state p AB . Including the reference R as a purification 
of p AB ', there are four systems RB'EB after she does so, where Alice possesses B' and E and Bob 
possesses B. Alice and Bob then operate according to the state redistribution protocol in order 
for Alice to transfer the B' system to Bob (this effects the channel simulation of J\f A ^ B on the 
state p AB )- Transferring the state requires some rate Q of noiseless quantum communication and 
some rate E of noiseless entanglement, and according to the main theorem of Refs. [23 EH] , this is 
possible as long as 

Q>h{B'-R\B), 
Q + E> H(B'\B). 

Comparing the above rate region with Theorem [12] of this paper reveals a close analogy between 
noiseless quantum communication / entanglement in QRST-QSI and noiseless classical communi- 
cation / common randomness in MC-QSI, with the factor of 1/2 above accounting for the fact that 
the communication in QRST-QSI is quantum. Though, one should be aware that this connection 
is only formally similar — in QRST-QSI, sometimes the protocol can generate entanglement rather 
than consume it, depending on the channel and the state on which the channel acts (this can never 
happen in MC-QSI because the entropy H(X\RB) is always positive for a classical X system). 

5.6 Applications of MC-QSI 

We now discuss three applications of MC-QSI. The first application is one that two of us an- 
nounced in Ref. [38J, the second involves developing a quantum reverse Shannon theorem for a 
quantum instrument, and the third is in reducing the classical communication cost of the local 
purity distillation protocol outlined in Ref. |41j . 
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5.6.1 Classically-assisted state redistribution 

For the first application, the setting is that Alice and Bob share many copies of some bipartite 
state p AB , and we would like to know how the resources of classical communication, quantum com- 
munication, and entanglement can combine with the state p AB for different information processing 
tasks. Let \ip) RAB be a purification of p AB . We found a general protocol, called "classically- 
assisted state redistribution," that when combined with teleportation, super-dense coding, and 
entanglement distribution can generate all of the known "static" protocols in the literature and is 
furthermore optimal for these tasks according to a multi-letter converse theorem [38J. In the first 
step of classically-assisted state redistribution, Alice and Bob employ the MC-QSI protocol in order 
to implement the following resource inequality: 

(p AB ) + I(X B - R\B)[c -+ c] + H(X B \RB)[cc] > (A^ Ms o T A ^ A ' XE ' : p AB ). (81) 

In the above, the resource on the RHS is a remote instrument, such that the map x A ~^ A ' XE ' is first 
simulated in such a way that Alice possesses the environment E' of the instrument T A ~^ A x , followed 
by a copying of the classical output X to one for Alice (Xa) and one for Bob (Xb) (the notation 
A ~^ A B indicates a classical copying channel). Let (j a ' x aX b E'b denote the post-measurement 
state. Conditional on the classical variable X, the parties then perform the state redistribution 
protocol \29\ I68j . in which Alice redistributes the share A' of the post-measurement state to Bob. 
The resource inequality for this task is as follows: 

{a A>E>x A \BX B) + ^A>. R \BX B )[q^q} + ~(l(A'; E>\X B ) - l{A';B\X B ))[qq] > ^ x A \A> bx b) ^ 

where the vertical divider | for the states above indicates who possesses what systems and the 
information quantities are all conditioned on X since this classical variable is available to both 
parties. The above resource inequality is equivalent to the following one, after applying the identity 
I(A';R\BX B ) = I(A';R\E'X B ) [291 EH] and moving the entanglement consumption to the RHS 
along with a sign inversion (so that it now corresponds to an entanglement generation rate): 

(a A ' x - E '\ BX °) + \l{A'-R\E'X B )[ q ^ q] > ±(l(A'; B\X B ) - I (A'; E'\X B )) [qq] + (o* x *\ A ' BX *). 
Overall, we then have the following resource inequality 

(p AB ) +I{X B \ R\B)[c -> c] +H{X B \RB)[cc] + h(A'- R\E'X B ) [q q] 

> 1 -{I{A'-B\X B ) - l{A'-E'\X B ))[ qq ], 

if we are not concerned with the "state redistribution" aspect of the protocol and merely its abilities 
for entanglement distillation. Finally, since the goal of the protocol is entanglement distillation and 
not actually simulating the measurement, we can exploit the common randomness to agree upon a 
particular protocol in the ensemble of these protocols for the task of entanglement distillation and 
it is not necessary to have common randomness as a resource (it can be derandomized and this is 
the content of Corollary 4.8 of Ref. [2B]). The final resource inequality is then 

(p AB ) +I(X B -R\B)[c^ c] + l -l(A';R\E'X B )[q -> q] > ±(l(A';B\X B ) - l{A';E'\X B ))[qq]. 

Combining the above protocol with teleportation, super-dense coding, and entanglement distribu- 
tion then gives all of the known protocols on the "static branch" of quantum information theory. 
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5.6.2 Quantum reverse Shannon theorem for a quantum instrument 

The quantum reverse Shannon theorem in its simplest form makes a statement regarding the ability 
of noiseless quantum communication and entanglement to simulate the action of some channel 
on many copies of a state p. A simple extension of the theorem that we discussed in the 
introduction is QRST-QSI, which simulates the channel on a bipartite state p AB . The resource 
inequality for this protocol is as follows: 

±l{R;B'\B)„[ q ^q} + 1 -(I(B';E) UJ -I(B';B) UJ ) [qq] > (U^ B ' E : p AB ), 

where the information quantities are with respect to a state oo RBE of the following form: 

, RB'EB _ TT A^B'E\< \RAB 

In the above, U^ B E is an isometric extension of the channel M and | ip p ) RAB is a purification of the 
state p AB . The protocol employs state redistribution [29J [68] . In this case, if ^(I(B'; E)^ — I(B'; B)^ 
is negative, then the protocol is generating entanglement rather than consuming it. A special case 
of the above theorem is when there is no quantum side information (when B is trivial), in which 
case the resource inequality becomes the usual quantum reverse Shannon theorem |24t [TJ [5j [10] : 

ll{R;B%[q -+q] + ±l{B';E)„[ qq ] > (Uj^ B ' E : />. 

Suppose that we instead would like to simulate the action of a quantum instrument J\f A ^ XB ' 
with classical output X and quantum output B' on the bipartite state p AB . A quantum instru- 
ment is the most general model for quantum measurement that includes both a classical output 
and a post-measurement quantum state [21J. A quantum instrument always admits the following 
decomposition 

M A ^ XB '( P ) = J>)(x| x ®N^ B \p), 

X 

in terms of the completely positive trace-nonincreasing maps N A ~^ B ' , such that the overall quantum 
map after tracing over the classical system A is a completely positive trace-preserving map: 

X 

Tr{N A ^ B '(p)} = 1. 
j^A-^b the following isometric extension: 

U A ^x* B 'E = J2\x) x \x) x *ufc B ' E , 

X 

where Uj^ B ' E is an extension of the map N A ~^ B ' . Thus, tracing over Xe and E recovers the 
action of the original instrument. 

If we are interested in simulating this channel on the state p AB , we could straightforwardly 
apply the quantum reverse Shannon theorem to show that the following resource inequality exists 

\l(R;B'X\B)4q -+ q] + \{l{#X; E) u - l{#X;B)J) [qq] > (U^ XXeB ' E : p AB ). (82) 
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Though, we could perform this task by using less quantum communication and entanglement if 
we exploit MC-QSI first followed by state redistribution (as we do in the classically-assisted state 
redistribution protocol) . The first step of the protocol implements the following resource inequality 

(p AB ) + I(X;R\B)[c -> c] + H(X\RB)[cc] > (A X ^ Ms oN A ^ B ' XE : p AB ), 
while the second is as follows: 

{a B>x A E\BX B) + ^B>. R \ BX )[ q ^ q ]+ 1 -(I(B'-E\X) - l[B';B\X))[qq] > {a ^E\B'BX B) _ 

Overall, the resource inequality for simulating the quantum instrument is as follows: 

1 



+ I(X; R\B)[c -> c] + H(X\RB)[cc] + -l(B'; R\BX)[q -> q] 



+ \{I{B'-E\X) - l(B';B\X))[qq] > (U^ XX * B ' E : p AB ), 



AB\ 



which is a cheaper simulation than in (|82j) because we are using classical communication and 
common randomness to achieve part of the task, rather than quantum communication and en- 
tanglement for the whole protocol. One would expect to have such a savings, since a quantum 
instrument has both a classical and quantum output. We remark that this approach is very similar 
to classically-assisted state redistribution from the previous section, with the exception that we 
require the common randomness since the goal is to simulate the instrument in full, rather than to 
distill entanglement. A special case of the above reverse Shannon theorem occurs when there is no 
quantum side information available, in which the resource inequality reduces to 

I(X;R)[c -> c] + H(X\R)[cc] + h{B';R\X)[q -> q] + h(B f ;E\X)[qq] > (U A ^ XXeB ' E : /). 
5.6.3 Classical communication cost in local purity distillation 

We can also exploit MC-QSI to improve upon the classical communication cost in local purity 
distillation [JT] • This leads to the following improvement of Theorem 1 of Ref. [H] : 

Theorem 13 The 1-way distillable local purity of the state p AB is given by = n*^, where 

R) = 4p A )+k(p b )+P_( P ab ,R). 

In the above, we have the definitions 

k{lu c ) = log dc-H(CU 



P JpAB R \= lim IpWffpAB^k kR 



and 



P W (p AB , R) = max{/(y ; B) a : I(Y; E\B) < R}, 

where ip ABE is a purification of p AB , Ai\ is a measurement map corresponding to the POVM A, 
and the maximization is over all POVMs mapping Alice's system A to a classical system Y . 
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The improvement of Theorem 1 of Ref. [H] comes about by reducing the classical communication 
rate from I(Y; EB) to I(Y; E\B) by employing the MC-QSI protocol in the achievability part. The 
converse part of the theorem (in (19) of Ref. [41 1) gets improved as follows: 

nR = log d Y > H(Y) > H(Y\B n ) > I(Y; E n \B n ). 

It is apparent that the multi-letter nature of the converse theorem is what led to the ability to 
improve the theorem, so that for any finite k, the above revision of the theorem improves upon the 
previous one, but they are both optimal in the regularized limit. This leads us to believe that even 
further improvements might be possible. 



5.7 Entropic uncertainty relation with QSI 

We close by relating the MC-QSI protocol to recent work on an entropic uncertainty relation in 
the presence of quantum memory |52^ [9J 159] \TQ I31j . This uncertainty relation characterizes the 
ability of two parties to predict the outcomes of measurements on another system, by exploiting 
the quantum systems in their possession. The formal statement of the uncertainty relation applies 
to a tripartite state p ABC and is as follows: 

H(X\B) + H(Z\C)>log 2 (l/ Cl ). (83) 

The two entropies are with respect to the following states resulting from applying measurement 
maps for A and T to the A system: 

^\x)(x\ x ®Tr AC {A A p AB % 

X 

5>><z|^Tr AB {rf p ABC }, 

z 

and c characterizes the non-commutativity of the measurements: 



ci = max 

x,z 



A*VT 



This uncertainty relation is useful conceptually, but it also has operational applications to quantum 
key distribution [91 [59], in relating data compression to privacy amplification [50, 54J, and in 
constructing capacity-achieving quantum error correction codes (for certain channels) [62] because 
it is formulated in terms of entropies. Another statement of the above entropic uncertainty relation 
is as follows [H [31] : 

H(X\B) + H(Z\B)>log 2 (l/c 2 ) + H(A\B), (84) 

where 

c 2 = max v / t7{AT^} . 

x,z 

Here, we show how the above uncertainty relations apply in bounding from below the nonlocal 
classical resources required in two different MC-QSI protocols. First, suppose that Alice would like 
to simulate the measurement {A^ } on the state p ABC and send the outcomes to Bob. Let R be 
a system that purifies the state p ABC . Then the MC-QSI protocol corresponds to the following 
resource inequality: 



, ABC- 



+ I(X; RC\B)[c c] + H(X\RBC)[cc] > (A A : p ABC ), 
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where the entropies are with respect to the state 

5>><*|*®T^{A£ ^ RABC ). 

X 

The total classical cost of the above protocol is H{X\B) = I(X;RC\B) + H(X\RBC). For the 
second protocol, suppose that Alice would like to simulate the measurement {T^ 1 } on the state 
pABC anc j senc j outcomes to Charlie. Then the MC-QSI protocol in this case corresponds to 
the following resource inequality: 

(p ABC )+I{Z;RB\C)[c^c} + H(Z\RBC)[cc} > (T A : p ABC ), 

where the entropies are with respect to the state 

Y)z){z\* <g>Tr A {Ti 4, RABC ). 

Z 

The total classical cost of the above protocol is H(Z\C) = I(Z; RB\C) + H(Z\RBC). 

Using the entropic uncertainty relation in (83), we can then bound from below the total classical 
cost of the above protocols as follows: 

I{X; RC\B) + H{X\RBC) + I(Z; RB\C) + H(Z\RBC) = H(X\B) + H(Z\C) > log 2 (l/ci). 

We can also apply the uncertainty relation in ( |84| ) to bound from below the total common random- 
ness cost: 

H{X\RBC) + H(Z\RBC) > log 2 (l/c 2 ) + H(A\RBC) 

= log 2 (l/c 2 ) - H(A), 

where the last equality follows because the state on RABC is pure. Since this lower bound might 
sometimes be negative but the entropies H(X\RBC) and H(Z\RBC) are always positive, we can 
revise the above lower bound to be as follows: 



H{X\RBC) + H(Z\RBC) > max{log 2 (l/c 2 ) - H(A), 0}. 

Given that lower bounds on the total classical cost and the total common randomness exist, 
one might be tempted to think that a lower bound on the total information should exist as well. 
One might conjecture it to be of the following form: 

I{X; RC\B) + I{Z- RB\C) > I, 

where I is some non-negative parameter that depends only on the measurements and not on the 
state. Such a universal, state-independent lower bound cannot hold in general, however. A simple 
counterexample demonstrates that the following lower bound for strong subadditivity is the best 
that one might hope for: 

I(X; RC\B) + I(Z; RB\C) > 0. 

Indeed, suppose that p ABC is a pure product state. Then I(X; RC\B) is equal to zero because 
R and C have no correlations with the measurement output X on A, and I(Z;RB\C) = for a 
similar reason. 
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6 Non-feedback measurement compression with quantum side in- 
formation 



Our final contribution concerns measurement compression with quantum side information, in the 
case where the sender is not required to obtain the outcome of the simulation, that is, a non-feedback 
simulation. We construct a protocol for this task by simply combining elements of other protocols 
described earlier in the article. Moreover, we show that the protocol is optimal by proving a single- 
letter converse for the associated rate region. We omit the detailed definition of the information 
processing task here because it is the obvious non- feedback relaxation along the lines of Section [3] 
for the definition of the MC-QSI task from Section [5] 

Theorem 14 (Non-feedback MC-QSI) Let p AB be a source state and M a quantum instrument 
to simulate on this state: 



{M A ^ AX ® I B ) {p AB ) = Y,(K A ® I B ) (P AB ) ® \x){ 



x\ x . 



There exists a protocol for faithful non-feedback simulation of the quantum instrument with classical 
communication rate R and common randomness rate S if and only if R and S are in the union of 
the following regions: 

R>I{W;R\B), (85) 
R + S > I(W;XR\B), 

where the entropies are with respect to a state of the following form: 

^Px\w(x\w)\w)(w\ w \x)(x\ x Tr A {{l R M A ® I B ) « AB )}, (86) 

(f) BAB is some purification of the state p AB , and the union is with respect to all decompositions of 
the original instrument N of the form: 

(N A ^ AX ®L B ){p AB ) = Y,Px\w{x\™){M A w ®I B ){p AB ) ® \x)(x\ x . (87) 

While demonstrating achievability of the quoted rates will consist of the routine combination 
of elements from other parts of the article, the converse is more subtle. In particular, a general 
protocol for non-feedback MC-QSI will have Bob perform an instrument on the B n system. Arguing 



that it is sufficient to restrict to states of the form (86) will involve comparing that protocol to 
a related simulation in which the instrument is implemented approximately by Alice. While the 
modified protocol would generally require more communication than the original, for the purposes 
of the converse, it need not significantly increase the relevant mutual informations. 

Proof Sketch of Achievability. The protocol for achievability naturally combines elements of 
protocols that we have considered in Section [3] for non- feedback measurement compression and in 
Section [5] for measurement compression with quantum side information. The protocol begins with 
Alice and Bob sharing many copies of a state p AB . They would like to simulate an instrument 
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Af A ^ AX , composed of the completely positive, trace non-increasing maps {A/jf }, so that they end 
up with many copies of a state of the following form: 

{N A ^ AX ® I B ) (p AB ) = ]T(A/; A ® I B ) (p AB ) ® \x)(x\ x . 

X 

We omit the details of the proof of the achievability part because it follows readily from the 
methods detailed in Sections [3] and [5| Instead, we state the achievability part as the following 
resource inequality: 

(p AB ) + I(W;R\B)[c-> c] +I{W;X\RB) [cc] > (M A ^ AX (p AB ) >. (88) 
where the information quantities are with respect to a state of the following form: 

J2px\w(*WWW W ® \^)(x\ X ® Tr A { (I R ®M A ® I B ) « AB )}- (89) 

x,w 

In the above, (p RAB is a purification of the state p AB and the maps {A4^} arise from a decomposition 
of the original instrument into the following form: 



^N A (a) ® \x){x\ x = J2Px\w(x\w)M A (a) ® \x)(x 



x 



when acting on some arbitrary state a. In particular, the protocol operates by Alice and Bob 
performing a simulation of M A , though Alice hashes the outcome of the simulated measurement. 
She sends the hash along to Bob using noiseless classical bits channels, and he then performs 
sequential decoding to search among all of the post-measurement states that are consistent with 
the hash and his share of the common randomness. This causes a negligible disturbance to the 



shared state in the asymptotic limit as long as the communication rates are as in (88). Finally, he 
simulates the classical post-processing channel Px\w( x \ w ) locally, leading to a savings in the cost 
of common randomness consumption. ■ 

With the achievability part in hand, we now move on to the proof of the converse. 
Proof of Converse. We now prove this converse part. A modification of Figure [7] (without the 
extra processing of L and M on Alice's side) depicts the most general protocol for a non-feedback 
simulation of the measurement with QSI. The protocol begins with the reference, Alice, and Bob 
sharing many copies of the state (f>p and Alice sharing common randomness M with Bob. She 
then chooses a quantum instrument T^ m ^ based on the common randomness M and performs it on 
her systems A n . The measurement returns outcome L, and the overall state is as follows: 

qR™ A n B n LM = £ 1 (T M ) A« (( ^AB^n ) ^ |/)(/| L ^ | m)(m |M 
l,m 

where T| m ' ) is a completely positive, trace non-increasing map. Alice sends the register L to Bob. 
Based on L and M, he performs some quantum instrument on his systems B n with trace non- 
increasing maps {J 7 ,^"^} followed by the stochastic map Px n \ SL M (x n \s, I, m) to give his estimate 
x n of the measurement outcome. The resulting state is as follows: 



\l){l\ L ® \m){m\ M ® \s){s\* ® \x n ){x r ' 



l,m,s,x n 

\M o |„\/„|S o I £Jl\ I ~TL I X r ' 
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The following condition should hold for all e > and sufficiently large n for a faithful non-feedback 
simulation: 



i < e, (90) 



where X n is a classical register isomorphic to X n . 
We prove the first bound as follows: 

nR > H{L) e 

> I(L; MB n R n ) e 

= I(LMB n ; R n ) e + I(L; MB n ) - I(R n ; MB n ) e 

> I(LMB n ; R n ) e - I{R n - B n ) e 

> I(LMSB n ; R n ) u - I(R n ; B n ) u - ne' 
= H(R"\B n ) UJ - H(R n \LMSB n ) UJ - ne' 

> Y,[ H (Rk\B k ) u - H(R k \LMSB k ) u } - n2e' 

k 

= Y,I(LMS;R k \B k ) UJ -n2e' 

k 

= nI(LMS; R\BK) a - n2e' 

> nI(LMS; R\BK) a + nI(R; K\B) a - n3e 
= nI(LMSK; R\B) a - n3e . 

The first two inequalities are similar to what we had before. The first equality is an identity 
for quantum mutual information. The third inequality follows because there are no correlations 
between R n B n and M so that I{MB n ;R n )u = I(B n ;R n ) u . The fourth inequality follows from 
quantum data processing of LMB n to produce LMSB n and from the fact that this does not change 
the state too much (we apply the condition in (90) and the Alicki-Fannes' inequality). The second 
equality is an identity for quantum mutual information. The fifth inequality follows from strong 
subadditivity of quantum entropy: 



H(R n \LMSB n ) w < H(R k \LMSB k )uj, 



and because the state on R n B n is close to a tensor-power state so that by Lemma 10, we have 

H(R n \B n )u > Y^HiR^B^-ne'. 

k 

The third equality is another identity. The fourth equality comes about by defining the state a as 
follows: 



a RALMSXK 



£ ^M\Px\LMs(x\ lmS ) X 

^((T^y- ® (^ imk y n )((<pf AB )® n )} 



l,m,k,x,s 



r ^(RAB) k 1 ' l {RAB) 



<g> \l){l\ L \m)(m\ M ® \s)(s\ s ® \x){x\ x ® \k)(k\ K , (91) 
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where the map p^ LMS (x\lms) is defined from pj^ n ^ LMS (x n \lms) by keeping only the k th symbol 
from x n . It also follows by exploiting the fact that K is a uniform classical random variable, 
with distribution 1/n, determining which systems R^A^B^X}, to select. Prom the fact that the 
measurement simulation is faithful, we can apply the Alicki-Fannes' inequality to conclude that 



l(RX;K\B) a 



I[RX;K\B) a - I(RX;K\B) T 



<e', 



(92) 



where r is a state like a but resulting from the tensor-power state for ideal measurement com- 
pression (and due to its IID structure, it has no correlations with any particular system k so that 
I(RX; K\B) T = 0). The same reasoning along with strong subadditivity also implies that 



I(R; K\B) a < e. 



(93) 



The final equality is an application of the chain rule for quantum mutual information. The state a 
for the final information term has the form: 



M 



AB^ABX l iRAB\ 



) = Y.Px\w{x\w)\x){x\ x ® Mi B ^ RAB ) 

x,w 



(94) 



with LMSK = W and the completely positive, trace non-increasing maps M AB defined by 
1 



g AB ^ 



rTr 



n 



\m\ {AB)r 1 {AB) n k+ 



{ ((T{ w y n ® (jf-))*") ® q ab ® « B )®^ fc )}. 



At this point, we have proved the first inequality in (85) for a state of the form in (94) where the 
map M. AB acts on the joint system AB. We now show that it is possible to construct from M. AB 
a map acting only on the system A (as stated in the theorem) causing only a negligible change 



to the information quantity in (85). The idea behind this is a simple application of Uhlmann's 



theorem. First, consider that the following inequality holds from the condition in (90) and from 
monotonicity of trace distance: 



^2px\w{ x \w)Tt: a {M 



AB {(p RAB 



)} -£Tr A {A/;(^)} 



i < e 



A purification of the state J2 x ^a{-^x(4> RAB )} is the state 4> RAB , and a purification of the state 
T,x,wPx\w{A w )^A{M AB {(t) RAB )} is the following state: 

E <i^i») r yWww x i»> f ) 



(95) 



W,X,l 



where we assume that the completely positive maps M AB have the following Kraus representation: 



Mi B (p AB )=Y, M u 



AB„AB( M \ \AB 



By Uhlmann's theorem, there exists an isometry jj a ^ awxi such that the trace distance between 
U A ^ AWXI ((f) RAB ) and the state in ([95|) is less than lyfe. To have the map M AB ^ ABX be im- 
plemented solely on Alice's system, we can simply perform the isometry jj A ^ rAWXI ; discard the 



(38 



register I, and perform von Neumann measurements of the registers W and X. (One could also 
discard register X, and then process W with Px\wi x \ w ) to produce X — it is possible to do this 
since X is classical.) This amounts to an approximate implementation of the following instrument: 



Y,Px\w(x\w)\x)(x\ x ® \w)(w\ w ® M^i^), 



from which we can discard register W to obtain an approximation of the map j\/l AB ^ ABX _ Thus, 
from the map J^/[ AB ~^ ABX ; it is possible to construct a nearby map of the form in (87), so that it 



suffices to optimize over the class of decompositions given in (87). 
We now prove the second bound: 



n(R + S)> H(LM) e 

> H(LM\B n ) e 

> I(LM-X n R n \B n ) e 
= I(LMB n ; X n R n ) e - I(B n ; X n R n ) e 

> I(LMSB n ; X n R n ) w - I(B n ; X n R n ) ( 
= H(X n R n \B n ) ul - H(X n R n \LMSB n ) u - 

> Y^[ H (XkRk\B k ) u - H{X k R k \LMSB k ) 

k 

= I(LMS; XMB^ - n2e' 

k 

= nI(LMS; XR\KB) a - n2e 

> nI(LMS; XR\KB) a + nI(K; XR\B) a - 
= nI(LMSK; XR\B) a - n3e'. 



nc 



ne 



n2e 



n3e 



The first three inequalities follow from similar reasons as our previous inequalities. The first equality 
is an identity. The fourth inequality follows from quantum data processing of LMB n to produce 
LMSB n and from the fact that this does not change the state too much (we apply the condition 



in (90) and the Alicki-Fannes' inequality). The fifth inequality follows from strong subadditivity of 

H{X n R n \LMSB n ) u] <J2H(X k R k \LMSB k ) w , 



entropy: 



and from the fact that the measurement simulation is faithful so that 



H{X n R n \B n ) 



E 

k 



H(X k R k \B k ) 



< ne', 



where we have applied a variation of Lemma 10 The third equality is an identity. The fourth 



equality follows by considering the state a as defined in (91). The sixth inequality follows from 



(92). The final equality is the chain rule for quantum mutual information. We can then consider 



the same argument as stated before in order to construct a map acting only on A from one acting 



on AB. Similarly, the resulting state has the form in (87). 
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7 Conclusion 




This paper provided a review of Winter's measurement compression theorem [65], detailing the 

information processing task, providing examples for understanding it, reviewing Winter's achiev- 
ability proof, and detailing a new approach to its single-letter converse theorem. We proved a new 
theorem characterizing the optimal rates for classical communication and common randomness for 
a measurement compression protocol where the sender is not required to obtain the outcome of the 
measurement simulation. We then reviewed the Devetak- Winter theorem on classical data com- 
pression with quantum side information, providing new proofs of the achievability and converse 
parts of this theorem. From there, we presented a new protocol called measurement compression 
with quantum side information (a protocol first announced in Ref. [38] ). This protocol has several 
applications, including its part in the "classically-assisted state redistribution" protocol, which is 
the most general protocol on the static side of the quantum information theory tree, and its role in 
reducing the classical communication cost in local purity distillation [H]. We then outlined a con- 
nection between this protocol and recent work in entropic uncertainty relations. Finally, we proved 
a single-letter theorem for the task of measurement compression with quantum side information 
when the sender is not required to obtain the outcome of the measurement simulation. 

There are several open questions to consider going forward from here. First, are there applica- 
tions of the MC-QSI protocol to rate distortion, as was the case for the Luo-Devetak protocol in 
Ref. [44J? Are there further applications of the measurement compression protocol in general? Is 
it possible to formulate a measurement compression protocol that is independent of the state on 
which it acts (similar to the general reverse Shannon theorem from Refs. [3110])? The answers to 
these questions could further illuminate our understanding of quantum measurement and address 
other important areas of quantum information theory. 
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A Typical sequences and typical subspaces 

A sequence x n is typical with respect to some probability distribution px{x) if its empirical distri- 
bution has maximum deviation 5 from px{x). The typical set Tg is the set of all such sequences: 




x 



,n 



n 



1 
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where N(x\x n ) counts the number of occurrences of the letter x in the sequence x n . The above 
notion of typicality is the "strong" notion (as opposed to the weaker "entropic" version of typicality 
sometimes employed [15J). The typical set enjoys three useful properties: its probability approaches 
unity in the large n limit, it has exponentially smaller cardinality than the set of all sequences, 
and every sequence in the typical set has approximately uniform probability. That is, suppose 
that X n is a random variable distributed according to px n {x n ) = Px(xi) ■ ■ -px(x n ), e is positive 
number that becomes arbitrarily small as n becomes large, and c is some positive constant. Then 
the following three properties hold [15] 



Vx{X n £T s xn } > 1-e, (96) 
\Tf"\ <2 n[H ( x)+c5 \ (97) 
Vx n E Tf" : 2~^ H ^ +c5 ^ < p x ™{x n ) < 2~«[ ff W- rf l. (98) 

We omit using c in the main text and instead subsume it as part of 8. 

These properties translate straightforwardly to the quantum setting by applying the spectral 
theorem to a density operator p. That is, suppose that 

P = ^Px(x)\x)(x\, 

X 

for some orthonormal basis {Ix)}^. Then there is a typical subspace defined as follows: 

1 



= span^ |s*" 



N(x\x n ) -px{x) 
n 



<5 Vx€X}, 



and let II" a denote the projector onto it. Then properties analogous to (96 98) hold for the typical 
subspace. The probability that a tensor power state p® n is in the typical subspace approaches 
unity as n becomes large, the rank of the typical projector is exponentially smaller than the rank 
of the full n-fold tensor-product Hilbert space of p® n , and the state p® n "looks" approximately 
maximally mixed on the typical subspace: 

Tr{n^p^}>l-e, (99) 
Tr{n^} < 2 n ^ H ^ +c5 \ (100) 

2 -n[H(B)+cS] ^ < jjn^ p ®n jjn^ < 2 ~n[H(B)-cS] rjn^ ^ 01 ^ 

where H(B) is the entropy of p. 

Suppose now that we have an ensemble of the form {px (x), p x }, and suppose that we generate a 



typical sequence x n according to the pruned distribution in (24), leading to a tensor product state 
p x n = p xi • • • (g> p Xn . Then there is a conditionally typical subspace with a conditionally typical 
projector defined as follows: 

xex 

where I x = {i : Xi = x} is an indicator set that selects the indices % in the sequence x n for which the 
i th symbol Xi is equal to x S X and 11^* s is the typical projector for the state p x . The conditionally 
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typical subspace has the three following properties: 

Tr{l% xntSPx n}>l-e, (102) 
Trjn^ J < 2 n ^ B ^ +c5 \ (103) 

2-n[H(B\X)+c8] rrn < tt" fl n TT n , < 2~™[ H ( B I X )~ C<5 ] H n , ('104) 

where H(B\X) = ^2 x Px(x)H(p x ) is the conditional quantum entropy. 

Let p be the expected density operator of the ensemble {px(x), p x } so that p = ^2 x Px{x)p x - 
The following properties are proved in Refs. [23 1 [63l [60] : 



Vx n £ T 5 X " : Tr{p x n U p } > 1 - e, 



-1 



(105) 



B Useful lemmas 

Here we collect some useful lemmas. 

Lemma 15 (Gentle Operator Lemma [63], 147] ) Let A be a positive operator where < A < I 
(usually A is a POVM element), p a state, and e a positive number such that the probability of 
detecting the outcome A is high: 

Tr{Ap} > 1 - e. 
Then the measurement causes little disturbance to the state p: 

p - VApVA i < 2 y/l. 

Lemma 16 (Gentle Operator Lemma for Ensembles [ 631 1471 160] ) Given an ensemble {px(x), p x } 
with expected density operator p = ^2 x Px(x)p x , suppose that an operator A such that I > A > 
succeeds with high probability on the state p: 

Tr{Ap} > 1 - e. 

Then the subnormalized state \/Ap x y/A is close in expected trace distance to the original state p x : 

®x{\\yfcpxyfc- Px\\i} < 2y/e. 

Lemma 17 Let p and a be positive operators and A a positive operator such that < A < /. Then 
the following inequality holds 

Tr{Ap} < Tr{Acr} + \\p - <r||i. 

Lemma 18 (Non-commutative union bound [56j ) Let a be a subnormalized state such that 
a > and Tr{cr} < 1. Let Yl\, . . . , Ily be projectors. Then the following "non- commutative union 
bound" holds 



Tr{cr} - Tr{niv ■ ■ ■ IIi<tIIi • • • ILv} < 2, 



N 



. ^Tx{(/-IL>}. 

\ i=i 
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